写在最前面:
昨天讲了如何用selenium操作浏览器刷访问量,结果跑了一晚上就被封了,打开博客链接直接就到登录界面了,气死!
也怪我昨天太急了,时间间隔设的太短。
不过
道高一尺,魔高一丈,
今天用urllib直接刷!
import time
import threading
from urllib import request,error
from urllib.error import URLError,HTTPError
import requests
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
#
def login():
i = 'https://blog.youkuaiyun.com/ssjdoudou/article/details/'
j = ['84305473','84312675','84261341','84257207','84261078','84204133','84203222','84202708','84194532','84194644','84142602','84189684','83999415','84146012','84105110','84109362','83987504', '83545438', '84023801', '84036156', '83934801',
'83931392', '83927099', '83901600','83869532', '84076067', '83860369', '83832502', '83796792', '83794522','83794429',
'83786919','83758980','83720386','83719691','83714375','83692895','83692501','83661576','83658652','83627202','83620957',
'83592276','83586587','83582300','83541030','83508015','83477144','83473955','83420467','83412751','83387953','83382481',
'83353600','83352361','83352066','83351442','83317964','83311866','83280985','83276624','83240486',
'83217851','83189452','83154903']
print(len(j))
data = {}
count = 0
header = {'User-Agent': 'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'}
for k in range(len(j)):
s = i + j[k]
myrequest = request.Request(s,data, headers=header)
rec = request.urlopen(myrequest) # 发送GET请求,获取博客文章页面资源
page = rec.read()
count += 1
print(count)
def mytime():
while 1:
try:
login()
time.sleep(75)
except ConnectionResetError:
login()
time.sleep(75)
except error.HTTPError:
login()
time.sleep(75)
finally:
login()
time.sleep(75)
if __name__ == "__main__":
t = threading.Thread(target=mytime())
t.start()
有两点注意
第一,ssl,
Python 升级到 2.7.9 之后引入了一个新特性,当使用urllib.urlopen打开一个 https 链接时,会验证一次 SSL 证书。
而当目标网站使用的是自签名的证书时就会抛出一个 urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificat verify failed (_ssl.c:581)> 的错误消息
解决办法如下:
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
第二,一定要用header伪装