豆瓣电影粗略版

最新推荐文章于 2024-03-14 18:24:52 发布

hiyunie

最新推荐文章于 2024-03-14 18:24:52 发布

阅读量1.1k

点赞数

分类专栏：爬虫

本文链接：https://blog.youkuaiyun.com/qq_45202835/article/details/104721425

版权

爬虫专栏收录该内容

11 篇文章

订阅专栏

使用的工具即环境

pycharm professional
python 3.6
requests库
lxml库

实验的代码

import requests
from lxml import etree

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0'
}
url = 'https://movie.douban.com/cinema/nowplaying/ganzhou/'
response = requests.get(url, headers=headers)
text = response.text
film_lists = []


# 对数据进行抓取
def imformation():
    html = etree.HTML(text)
    ul = html.xpath("//ul[@class='lists']")[1]
    lis = ul.xpath('./li')
    for li in lis:
        title = li.xpath('./@data-title')
        duration = li.xpath('./@data-duration')
        region = li.xpath('./@data-region')
        actors = li.xpath('./@data-actors')
        dicts = {
            'title': title,
            'duration': duration,
            'region': region,
            'actors': actors,
        }
        film_lists.append(dicts)

# 展示数据
def show_date():
    for i in film_lists:
        for k, v in i.items():
            print(k, v)
        print('---------------------')


if __name__ == '__main__':
    imformation()
    show_date()