爬取百度地图POI时出现的问题及解决
在进行爬取时多次中断,开始时会出现Error10054和Error10060
查找解决办法后发现,
前者通过添加header解决,如下:
(参考https://blog.youkuaiyun.com/qq_34369025/article/details/56487298)
headers = {
'User-agent': 'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0'}
request = urllib2.Request(url,headers= headers)
f = urllib2.urlopen(request)
html = f.read()
后者网上说法是连接时网络不稳定造成的,于是通过多次尝试连接的函数来解决,如下:
(参考https://blog.youkuaiyun.com/u011350541/article/details/52331819)
def getUrl_multiTry(url):
user_agent ='"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36"'
headers = {
'User-Agent' : user_agent }
maxTryNum<