requests_xcdl（requests_西刺代理）

最新推荐文章于 2019-01-08 10:10:31 发布

jingwenliu

最新推荐文章于 2019-01-08 10:10:31 发布

阅读量288

点赞数

CC 4.0 BY-SA版权

分类专栏：爬虫

本文链接：https://blog.youkuaiyun.com/jingwenliu/article/details/81712233

爬虫专栏收录该内容

14 篇文章

订阅专栏

本文介绍了一种使用Python的requests库结合代理和自定义headers抓取西刺代理网站的方法。通过设置代理和headers，可以成功绕过网站的部分反爬虫机制，获取网页内容并保存为本地文件。

import requests
url = 'http://www.xicidaili.com'
# <!-- 添加proxy -->
proxy = {
    'http':'http://root:Yao+ql2011@101.200.50.18:8118'
}
# <!-- 添加 headers -->
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
}
# <!-- 调用requests，得到response -->
response = requests.get(url,headers=headers,proxies=proxy)
print(response.text)

with open('xicidaili.html','wb') as f:
    f.write(response.content)