爬去网页-Requests,网站库-Scrapy,全网爬取-定制Google这种。
爬取京东一个页面的信息
import requests
url = 'http://item.jd.com/2967929.html'
try:
r = requests.get(url)
r.raise_for_status()
#如果状态不是200,引发HTTPError异常
r.encoding = r.apparent_encoding
print(r.text[:1000])
except:
return("爬取失败")
爬取亚马逊的一个网页信息
import requests
url = 'https://www.amazon.cn/gp/yourstore/home/ref=nav_cs_ys'
try:
kv = {"user-agent":"Mozilla/5.0"}
r = requests.get(url,header=kv)
r.raise_for_status()
#如果状态不是200,引发HTTPError异常
r.encoding = r.apparent_encoding
print(r.text[:1000])
except:
return("爬取失败")
百度360搜索提交
import requests
keyword = "python"
try:
kv = {"wd":keyword}
r = requests.get("http://bai