Python爬虫(使用requests)

爬取演员信息

转载于 2015-07-17 23:37:00 发布 · 134 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：http://www.cnblogs.com/premier/p/4655920.html

文章标签：

#python #爬虫

本文介绍了一个使用Python的requests库和lxml库抓取网页数据的例子。通过设置HTTP头部信息模拟浏览器行为，成功获取了目标网站上演员的名字等信息。

import requests
from lxml import etree


url = "http://avdb.la/actor/"

headers = {"User-Agent":'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 Safari/537.36'}

html = requests.get(url,headers = headers)

content = html.text
#content = content.encode("utf8")

selector = etree.HTML(content)

name = selector.xpath('//*[@id="waterfall"]/div/a/@title')