1.selenium 模块安装及使用
驱动下载:http://chromedriver.storage.googleapis.com/index.html
https://www.cnblogs.com/Python666/p/7816653.html
2.键盘,鼠标模拟:
导入 pyautogui 包 其中的press函数表示键盘按下
如pyautogui.press('space')表示按下空格键
详见:https://www.jianshu.com/p/41463c82ec8f
3.爬虫request包中,get,post请求的基本用法
http://www.cnblogs.com/nizhihong/p/6567928.html
4.lxml库基本用法
https://www.cnblogs.com/BigFishFly/p/6380016.html
5.request代理ip用法:
原文:https://blog.youkuaiyun.com/you_are_my_dream/article/details/60333331
import requests
proxies = {
"http" : "http://111.155.124.78:8123" # 代理ip
}
headers = {
"User_Agent" : "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
"Referer" : "http://www.xicidaili.com/nn/1"
}
http_url = "http://www.xicidaili.com/nn/1"
res = requests.get(url = http_url, headers = headers, proxies = proxies, timeout = 30)
if res.status_code == 200:
print u"访问网页成功"
else:
print u"代理ip错误"
6.关于ssl证书验证问题:
https://www.cnblogs.com/fh-fendou/p/7479812.html
7.scrapy框架
https://www.cnblogs.com/lyrichu/p/6732874.html