一个简单的爬虫douban_list_spider.py

本文介绍了一个简单的Python爬虫douban_list_spider.py,用于从豆瓣抓取电影、书籍或音乐的条目信息。通过配置变量object、tag_list和page_num,可以执行抓取并输出到指定文件。
部署运行你感兴趣的模型镜像

源码在此:
https://github.com/CuiBinghua/douban_list_spider/blob/master/douban_list_spider.py



1. 简介

douban_list_spider.py是一个简单的爬虫,可以根据关键字抓取豆瓣电影、豆瓣读书或者豆瓣音乐的条目信息.

2. Python环境

本人的Python版本为:2.6.6
另外还需要安装必要的Python插件:
$ easy_install requests
$ easy_install BeautifulSoup4

3. 执行抓取

首先对douban_list_spider.py中的变量object、tag_list和page_num进行配置。
然后执行命令即可:
$ python douban_list_spider.py
最后,就可以在相同目录下查看到输出文件movie_list.txt、book_list.txt或者music_list.txt了。

4. 参考资料

http://plough-man.com/?p=379

https://github.com/plough/myCrawler/blob/master/doubanBook/book_list_spider.py

您可能感兴趣的与本文相关的镜像

Python3.8

Python3.8

Conda
Python

Python 是一种高级、解释型、通用的编程语言,以其简洁易读的语法而闻名,适用于广泛的应用,包括Web开发、数据分析、人工智能和自动化脚本

PS C:\Users\童琪琪\Desktop\bishe.6\biyesheji.6\movie_analysis_project> scrapy list Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "D:\Anaconda\Scripts\scrapy.exe\__main__.py", line 7, in <module> File "D:\Anaconda\Lib\site-packages\scrapy\cmdline.py", line 160, in execute cmd.crawler_process = CrawlerProcess(settings) ^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Anaconda\Lib\site-packages\scrapy\crawler.py", line 357, in __init__ super().__init__(settings) File "D:\Anaconda\Lib\site-packages\scrapy\crawler.py", line 227, in __init__ self.spider_loader = self._get_spider_loader(settings) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Anaconda\Lib\site-packages\scrapy\crawler.py", line 221, in _get_spider_loader return loader_cls.from_settings(settings.frozencopy()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Anaconda\Lib\site-packages\scrapy\spiderloader.py", line 79, in from_settings return cls(settings) ^^^^^^^^^^^^^ File "D:\Anaconda\Lib\site-packages\scrapy\spiderloader.py", line 34, in __init__ self._load_all_spiders() File "D:\Anaconda\Lib\site-packages\scrapy\spiderloader.py", line 63, in _load_all_spiders for module in walk_modules(name): ^^^^^^^^^^^^^^^^^^ File "D:\Anaconda\Lib\site-packages\scrapy\utils\misc.py", line 106, in walk_modules submod = import_module(fullpath) ^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Anaconda\Lib\importlib\__init__.py", line 90, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1387, in _gcd_import File "<frozen importlib._bootstrap>", line 1360, in _find_and_load File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 935, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 995, in exec_module File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed File "C:\Users\童琪琪\Desktop\bishe.6\biyesheji.6\movie_analysis_project\movie_analysis_project\spiders\douban_spider.py", line 5, in <module> from ..items import MovieItem, UserItem, ReviewItem ImportError: cannot import name 'MovieItem' from 'movie_analysis_project.items' (C:\Users\童琪琪\Desktop\bishe.6\biyesheji.6\movie_analysis_project\movie_analysis_project\items.py) PS C:\Users\童琪琪\Desktop\bishe.6\biyesheji.6\movie_analysis_project>
最新发布
11-15
(.venv) PS D:\python\pythonProject1-scrapy\myproject> scrapy crawl douban_movies -o news.csv Traceback (most recent call last): File "D:\python\python38\lib\runpy.py", line 192, in _run_module_as_main return _run_code(code, main_globals, None, File "D:\python\python38\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "D:\python\pythonProject1-scrapy\.venv\Scripts\scrapy.exe\__main__.py", line 7, in <module> File "D:\python\pythonProject1-scrapy\.venv\lib\site-packages\scrapy\cmdline.py", line 160, in execute cmd.crawler_process = CrawlerProcess(settings) File "D:\python\pythonProject1-scrapy\.venv\lib\site-packages\scrapy\crawler.py", line 357, in __init__ super().__init__(settings) File "D:\python\pythonProject1-scrapy\.venv\lib\site-packages\scrapy\crawler.py", line 227, in __init__ self.spider_loader = self._get_spider_loader(settings) File "D:\python\pythonProject1-scrapy\.venv\lib\site-packages\scrapy\crawler.py", line 221, in _get_spider_loader return loader_cls.from_settings(settings.frozencopy()) File "D:\python\pythonProject1-scrapy\.venv\lib\site-packages\scrapy\spiderloader.py", line 79, in from_settings return cls(settings) File "D:\python\pythonProject1-scrapy\.venv\lib\site-packages\scrapy\spiderloader.py", line 34, in __init__ self._load_all_spiders() File "D:\python\pythonProject1-scrapy\.venv\lib\site-packages\scrapy\spiderloader.py", line 63, in _load_all_spiders for module in walk_modules(name): File "D:\python\pythonProject1-scrapy\.venv\lib\site-packages\scrapy\utils\misc.py", line 106, in walk_modules submod = import_module(fullpath) File "D:\python\python38\lib\importlib\__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1014, in _gcd_import File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 671, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 783, in exec_module File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "D:\python\pythonProject1-scrapy\myproject\myproject\spiders\douban_movies.py", line 2, in <module> from movie1905.items import NewsItem ModuleNotFoundError: No module named 'movie1905' (.venv) PS D:\python\pythonProject1-scrapy\myproject>
05-27
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值