我想用beauthulsoup删除一些网址。我正在抓取的URL来自googleanalytics API调用,其中一些不能正常工作,所以我需要找到一种方法跳过它们。在
我试着补充一下:except urllib2.HTTPError:
continue
但我遇到了以下语法错误:
^{pr2}$
以下是我的完整代码:rawdata = []
urllist = []
sharelist = []
mystring = 'http://www.konbini.com'
def print_results(results):
# Print data nicely for the user.
if results:
for row in results.get('rows'):
rawdata.append(row[0])
else:
print 'No results found'
urllist = [mystring + x for x in rawdata]
for row in urllist:
# query the website and return the html to the variable 'page'
page = urllib2.urlopen(row)
except urllib2.HTTPError:
continue
soup = BeautifulSoup(page, 'html.parser')
# Take out the
name_box = soup.find(attrs={'class': 'nb-shares'})
if name_box is None:
continue
share = name_box.text.strip() # strip() is used to remove starting and trailing
# save the data in tuple
sharelist.append((row,share))
print(sharelist)