使用urllib和beautifulSoup实现爬虫抓取小说网站书名，解决BUG：NoneType object has no attribute 'find_all'

最新推荐文章于 2025-07-02 13:31:46 发布

重装系统20块谢谢

最新推荐文章于 2025-07-02 13:31:46 发布

阅读量1.4w

点赞数

CC 4.0 BY-SA版权

分类专栏：爬虫 urllib BeautifulSoup

本文链接：https://blog.youkuaiyun.com/qq_37828633/article/details/80641431

本文介绍如何使用Python的urllib库抓取小说网站的数据，并通过BeautifulSoup进行解析。在解析过程中，特别提到了一个常见错误：当对象为None时调用'find_all'方法会抛出AttributeError。解决方案是确保在调用该方法前对象已正确初始化。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

首先找一个网站，例如

urlHTML='http://www.douban.com/tag/%E5%B0%8F%E8%AF%B4/?focus=book'

接下来利用urllib库抓取数据，保存数据到一个变量中

request_data=urllib.request.urlopen(urlHTML)

用beautifulSoup解析网页语法，并保存结果，注意此处第二个参数不能使用单引号，

否则会出现BUG NoneType object has no attribute 'find_all'

soup=BeautifulSoup(request_data,"html.parser")



字典定义筛选规则，使用bS库find方法抓取数据

sty={

最低0.47元/天解锁文章

200万优质内容无限畅学

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

重装系统20块谢谢

关注关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
分享

复制链接

分享到 QQ

分享到新浪微博

扫一扫
举报

举报

专栏目录

[Python从零到壹] 五.网络爬虫之BeautifulSoup基础语法万字详解

杨秀璋的专栏

11-08

1万+

欢迎大家来到“Python从零到壹”，在这里我将分享约200篇Python系列文章，带大家一起去学习和玩耍，看看Python这个有趣的世界。所有文章都将结合案例、代码和作者的经验讲解，真心想把自己近十年的编程经验分享给大家，希望对您有所帮助，文章中不足之处也请海涵。Python系列整体框架包括基础语法10篇、网络爬虫30篇、可视化分析10篇、机器学习20篇、大数据分析20篇、图像识别30篇、人工智能40篇、Python安全20篇、其他技巧10篇。您的关注、点赞和转发就是对秀璋最大的支持，知识无价人有情，希望

爬虫实战遇到的问题及解决汇总 / 爬虫原理介绍

Quest_sec的博客

03-14

2829

如何写一个爬虫程序爬取豆瓣内容？

1 条评论您还未登录，请先登录后发表或查看评论

使用python爬虫，requests(夹带BeautifulSoup的使用)爬取网络小说

kjadhgfiuao的博客

11-06

652

使用python爬虫，requests(夹带BeautifulSoup的使用)爬取网络小说由于本人也是初学者，算是小白一枚，这里跟大家分享一下爬取网站上的小说的过程。第一步我们需要导入我们需要的模块，比如requests,BeautifulSoup,还有正则模块re。 import re import requests from bs4 import BeautifulSoup 然后我们需要找到我们需要爬取的网站，这里我选用了这个网站：* http://www.tianxiabachang.cn 接

解决python爬虫时遇到AttributeError: ‘NoneType‘ object has no attribute ‘find_all‘

小朱小朱绝不认输的博客

09-16

8万+

最近在练习学到的爬虫实例遇到AttributeError: ‘NoneType’ object has no attribute 'find_all’的错误。爬虫要求如下：任务描述：https://movie.douban.com/cinema/later/beijing/ 这个页面描述了北京最近上映的电影，你能否通过 Python 得到这些电影的名称、上映时间和海报呢？这个页面的海报是缩小版的，我希望你能从具体的电影描述页面中抓取到海报。在运行老师给的代码如下： import requests fr

一文看懂爬虫解析神器：BeautifulSoup 使用指南

Crossin的编程教室

07-02

779

BeautifulSoup（简称 BS）是一个 Python 库，专为解析 HTML 和 XML 设计，名字灵感来自《爱丽丝梦游仙境》，透着股奇幻文艺范儿。相比正则表达式的复杂匹配，BS 直接通过标签和类名定位，代码量减少一半，逻辑清晰。在做爬虫项目时一定遇到过这样的问题：网页是抓取下来了，但打开来发现都是“乱糟糟”的 HTML 代码。BS 的核心是把 HTML 解析成结构化对象，再通过标签、属性或选择器提取数据。：直观的 API，强大的社区支持，搭配多种解析器，无论是新手还是老手都能快速上手。

使用BeautifulSoup的soup.find()时出现错误AttributeError NoneType object has no attribute

a_cherry_blossoms的博客

06-04

7002

报错：AttributeError: 'NoneType' object has no attribute1.问题2.原因及分析2.1原因2.2分析我琢磨着可能是因为class值中的最后那个“空格”有问题。所以我就将我代码中的class值中的空格去掉了。3.总结就是你使用的find没有找到你需要的那个标签。问题的原因之一可能就是我上面所说的，然后动动脑筋，这里改改，那里改改，兴许就能像我这样把问题给改没了呢！ 1.问题使用BeautifulSoup的soup.find(“div”,class_="***"

使用BeautifulSoup，爬取网站小说名，并打印出来

野猫炫的博客

11-10

403

自动化测试基础实例爬取网站小说名，并打印出来 import requests from bs4 import BeautifulSoup r=requests.get('http://www.zongheng.com/rank/details.html?rt=5&d=1') soup=BeautifulSoup(r.text,"html5lib") a = soup.find_all('div',class_="rank_d_b_name") for i in a: print

Python3 Learning（五）BeautifulSoup爬取网页小说

我知道你很急,但是路要一步步走

01-25

348

# -*- coding:UTF-8 -*- from urllib import request from bs4 import BeautifulSoup if __name__ == "__main__": # 访问网址url download_url = 'https://www.qu.la/book/2125/10553318.html' # 请求访问者的信息...

Python爬虫 object has no attribute ‘title’ 问题解决

weixin_44038564的博客

11-01

4641

Python爬虫 object has no attribute ‘title’ 问题解决在学习python爬虫时，想要获取title信息，遇到object has no attribute ‘title’ 的问题，仔细看并没有标点和拼写错误原代码 from urllib.request import urlopen from bs4 import BeautifulSoup as bf html=urlopen("http://www.baidu.com/") obj=bf(html.read()

爬虫项目实战中遇到‘NoneType‘ object has no attribute ‘children‘错误

g11458的博客

07-17

855

爬虫项目实战中遇到'NoneType' object has no attribute 'children'错误

爬虫报错AttributeError: ‘NoneType‘ object has no attribute ‘find_all‘

FHIceng的博客

01-10

628

经测试，无论哪级div都可以使用。上述代码为最高一级div，以下代码为最低一级div，效果相同未报错。总结：使用BeautifulSoup查找元素有误。class标签错写为id。

多线程爬虫出现报错AttributeError: ‘NoneType’ object has no attribute ‘xpath’

12-21

多线程爬虫出现报错AttributeError: ‘NoneType’ object has no attribute ‘xpath’一、前言二、问题三、思考和解决问题四、运行效果一、前言 mark一下，本技术小白的第一篇优快云博客！最近在捣鼓爬虫，看的是机械工业出版社的《从零开始学Python网络爬虫》。这书吧，一言难尽，优点是案例比较多，说的也还算清楚，但是槽点更多：1、较多低级笔误；2、基础知识一笔带过，简单得不能再简单，对Python基础不好的人不友好；3、代码分析部分，相同的代码反复啰嗦解释多次，而一些该解释的新代码却只字不提；4、这是最重要的一点，但也不全是本书的锅。就是书中

完美解决爬虫时遇到的‘NoneType‘ object has no attribute ‘find‘或‘NoneType‘ object has no attribute ‘find_all‘问题

热门推荐

Wzp的博客

11-26

11万+

在网上看到了一个爬虫教程，就跟着学了起来，出现了点问题：‘NoneType’ object has no attribute ‘find’；问题说明我是一个刚入门的小白，刚研究了点爬虫，我觉得这个问题其实就是没有找到相应的html element（网页元素），所以没有相应的元素方法，所以报错“no attribute”。只要我们准确的找到相应的元素，就可以用BeautifulSoup中的方...

python3 nonetypefind_python-使用BeautifulSoup进行Web抓取返回NoneType

weixin_39928106的博客

12-22

382

我正在尝试使用BeautifulSoup抓取一个网站,并编写了以下代码：import requestsfrom bs4 import BeautifulSouppage = requests.get("https://gematsu.com/tag/media-create-sales")soup = BeautifulSoup(page.text, 'html.parser')try:conte...

bs4.beautiful soup 爬虫报错none type_P10-11《Python爬虫技术5天速成…》学习过程笔记8（超详细记录）...

weixin_35235724的博客

12-20

163

对应原视频第10集元组字典(上)&第11集字典(下)_集合 & 数据结构小结：本篇概要：(这两集的学习没遇到什么困难，都是知识点的学习了解，花时间多看视频就好，下面的内容基本只是概要。)P10--元组里只有一个元素时一定要在元素后面加一个英文逗号P11--数据结构小结--元组里只有一个元素时一定要在元素后面加一个英文逗号P10元组 Tuple第二点，比如tuple里边含有...

Python手记-10：Beautiful Soup爬取豆瓣经典书单

成屿的专栏

05-11

1033

1.Beautiful Soup简介 Beautiful Soup名字来源于《爱丽丝梦游仙境》，是一个可以从HTML或XML文件中提取数据的Python库，当前版本4.4.0，Beautiful Soup 3目前已经停止开发，官方推荐使用Beautiful Soup 4（简称BS4），官文指路：https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/，不得不说Beautiful Soup官文的可读性秒爆lxml的。 Beautiful Soup最主要的功能是.

Python爬虫 BeautifulSoup成功解决AttributeError: ResultSet object has no attribute ‘text‘

weixin_44991673的博客

12-12

5418

import requests from bs4 import BeautifulSoup res = requests.get('https://wordpress-edu-3autumn.localprod.oc.forchange.cn/all-about-the-future_04/') # 把网页解析为BeautifulSoup对象 soup = BeautifulSoup(res.text,'html.parser') items=soup.find_all(class_='comment-

BeautifulSoup类用法总结

Yang's Blog

07-06

558

BeautifulSoup可以用来解析Requests库爬取的html代码一、BeautifulSoup的基本使用 import requests from bs4 import BeautifulSoup as bs def get_page(url): try: header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrom.

‘NoneType‘ object has no attribute ‘find_all‘问题解决

hellohi1的博客

06-08

7076

Python爬虫问题描述：原因分析：解决方案：问题描述： Python爬虫学习过程中遇到的AttributeError: ‘NoneType’ object has no attribute ‘find_all’ 问题解决 Traceback (most recent call last): File "D:\Study\Python\爬虫\03数据解析\爬取小说top.py", line 29, in <module> parse_infos = parse_soup.fin

import curses import requests from bs4 import BeautifulSoup from urllib.parse import urljoin import os # 文件路径 FAVORITES_FILE = 'favorites.txt' HISTORY_FILE = 'history.txt' # 加载数据 def load_list(filename): try: with open(filename, 'r', encoding='utf-8') as f: return [line.strip() for line in f if line.strip()] except FileNotFoundError: return [] # 保存数据 def save_list(filename, items): with open(filename, 'w', encoding='utf-8') as f: for item in items: f.write(item + '\n') # 获取网页纯文本 def fetch_page(url): try: res = requests.get(url, timeout=5) res.raise_for_status() soup = BeautifulSoup(res.text, 'html.parser') for tag in soup(['script', 'style']): tag.decompose() return soup.get_text() except Exception as e: return f"加载失败: {e}" # 提取超链接和表单（扩展版） def extract_links_and_forms(html, base_url): soup = BeautifulSoup(html, 'html.parser') # 提取超链接 links = [] for a in soup.find_all('a', href=True): href = a['href'] if not href.startswith('http'): href = urljoin(base_url, href) links.append((href, a.get_text(strip=True))) # 提取表单 forms = [] for form in soup.find_all('form'): action = form.get('action', '') if not action.startswith('http'): action = urljoin(base_url, action) method = form.get('method', 'get').lower() inputs = [] # 提取所有输入字段 for input_tag in form.find_all(['input', 'textarea', 'select']): name = input_tag.get('name') if not name: continue input_type = input_tag.get('type', 'text') value = input_tag.get('value', '') required = input_tag.has_attr('required') placeholder = input_tag.get('placeholder', '') if input_tag.name == 'textarea': input_type = 'textarea' value = input_tag.get_text() elif input_tag.name == 'select': input_type = 'select' options = [(opt.get('value') or opt.text, opt.text) for opt in input_tag.find_all('option')] inputs.append({ 'name': name, 'type': input_type, 'value': value, 'required': required, 'placeholder': placeholder, 'options': options }) continue # 处理 checkbox 和 radio if input_type == 'checkbox' or input_type == 'radio': inputs.append({ 'name': name, 'type': input_type, 'value': value, 'checked': input_tag.has_attr('checked'), 'required': required }) continue inputs.append({ 'name': name, 'type': input_type, 'value': value, 'required': required, 'placeholder': placeholder }) forms.append((action, method, inputs)) return links, forms # 输入框 def input_box(stdscr, prompt): curses.echo() stdscr.clear() stdscr.addstr(0, 0, prompt) stdscr.refresh() input_str = stdscr.getstr(1, 0).decode('utf-8') curses.noecho() return input_str # 提交表单（扩展版） def submit_form(stdscr, form): action, method, inputs = form data = {} for field in inputs: name = field['name'] input_type = field['type'] required = field.get('required', False) placeholder = field.get('placeholder', '') value = field.get('value', '') prompt = f"{name}" if placeholder: prompt += f"（提示：{placeholder}）" if required: prompt += " [必填]" if input_type == 'select': options = field['options'] stdscr.clear() stdscr.addstr(0, 0, f"请选择 {name}：") for i, (val, text) in enumerate(options): stdscr.addstr(i + 1, 0, f"{i + 1}. {text}") stdscr.refresh() idx = int(stdscr.getstr(len(options) + 2, 0).decode('utf-8')) - 1 data[name] = options[idx][0] elif input_type == 'checkbox': checked = field.get('checked', False) res = input_box(stdscr, f"{name} [复选框] 是否选中？(y/n)：") data[name] = 'on' if res.lower() == 'y' else '' elif input_type == 'radio': res = input_box(stdscr, f"{name} [单选] 是否选中？(y/n)：") data[name] = value if res.lower() == 'y' else '' else: default = value if value else '' user_input = input_box(stdscr, f"{prompt}：") data[name] = user_input if user_input else default try: if method == 'post': res = requests.post(action, data=data) else: res = requests.get(action, params=data) return res.text except Exception as e: return f"表单提交失败：{e}" # 收藏夹菜单 def favorites_menu(stdscr, favorites, current_url): while True: options = ["新添", "删除", "退出"] action = show_list(stdscr, options, "收藏夹", options) if action == 0: # 新添 if current_url not in favorites: favorites.append(current_url) save_list(FAVORITES_FILE, favorites) elif action == 1: # 删除 if favorites: idx = show_list(stdscr, favorites, "选择要删除的收藏") if idx >= 0: favorites.pop(idx) save_list(FAVORITES_FILE, favorites) elif action == 2 or action == -1: break # 历史记录菜单 def history_menu(stdscr, history): while True: options = ["清空", "退出"] action = show_list(stdscr, options, "历史记录", options) if action == 0: # 清空 history.clear() save_list(HISTORY_FILE, history) elif action == 1 or action == -1: break elif action >= 0: return history[action] return None # 显示列表（收藏夹/历史记录） def show_list(stdscr, items, title, actions=None): selected = 0 while True: stdscr.clear() stdscr.addstr(0, 0, title) if actions: stdscr.addstr(0, len(title) + 2, f"| {' | '.join(actions)}") stdscr.addstr(1, 0, "-" * 50) for i, item in enumerate(items): if i == selected: stdscr.attron(curses.A_REVERSE) stdscr.addstr(i + 2, 0, f"{i + 1}. {item}") if i == selected: stdscr.attroff(curses.A_REVERSE) stdscr.addstr(len(items) + 3, 0, "方向键选择，Enter确认，q退出") stdscr.refresh() key = stdscr.getch() if key == curses.KEY_UP and selected > 0: selected -= 1 elif key == curses.KEY_DOWN and selected < len(items) - 1: selected += 1 elif key == ord('\n'): return selected elif key == ord('q'): return -1 def main(stdscr): curses.curs_set(0) # 隐藏光标 favorites = load_list(FAVORITES_FILE) history = load_list(HISTORY_FILE) # 初始页面加载 current_url = input_box(stdscr, "请输入网址：") page_text = fetch_page(current_url) if current_url not in history: history.append(current_url) if len(history) > 20: history.pop(0) save_list(HISTORY_FILE, history) # 初始数据 menu_options = ["收藏夹", "新的网页", "历史记录"] selected_menu = 0 links, forms = extract_links_and_forms(page_text, current_url) selected_link = 0 selected_form = 0 mode = "menu" # 当前模式：menu, link, form while True: stdscr.clear() h, w = stdscr.getmaxyx() # 显示顶部菜单 for i, opt in enumerate(menu_options): x = 2 + i * 15 if mode == "menu" and i == selected_menu: stdscr.attron(curses.A_REVERSE) stdscr.addstr(0, x, opt) if mode == "menu" and i == selected_menu: stdscr.attroff(curses.A_REVERSE) stdscr.addstr(1, 0, "-" * w) # 显示网页内容 lines = page_text.split('\n') for i, line in enumerate(lines[:h - 10]): stdscr.addstr(i + 2, 0, line[:w - 1]) # 显示超链接 stdscr.addstr(h - 8, 0, "超链接：") for i, (url, text) in enumerate(links): label = f"{i + 1}. {text[:30]}..." if mode == "link" and i == selected_link: stdscr.attron(curses.A_REVERSE) stdscr.addstr(h - 7 + i, 0, label[:w - 1]) if mode == "link" and i == selected_link: stdscr.attroff(curses.A_REVERSE) # 显示表单 stdscr.addstr(h - 7 + len(links) + 1, 0, "表单：") for i, (action, method, inputs) in enumerate(forms): label = f"表单 {i + 1}: {method.upper()} {action[:30]}..." if mode == "form" and i == selected_form: stdscr.attron(curses.A_REVERSE) stdscr.addstr(h - 6 + len(links) + i + 1, 0, label[:w - 1]) if mode == "form" and i == selected_form: stdscr.attroff(curses.A_REVERSE) # 底部提示 stdscr.addstr(h - 1, 0, "方向键选择，Enter确认，q返回，Tab切换区域") stdscr.refresh() # 用户输入处理 key = stdscr.getch() if key == ord('q'): break elif key == ord('\t'): # 切换模式：菜单 -> 链接 -> 表单 -> 菜单 if mode == "menu": mode = "link" elif mode == "link": mode = "form" else: mode = "menu" elif key == curses.KEY_UP: if mode == "menu" and selected_menu > 0: selected_menu -= 1 elif mode == "link" and selected_link > 0: selected_link -= 1 elif mode == "form" and selected_form > 0: selected_form -= 1 elif key == curses.KEY_DOWN: if mode == "menu" and selected_menu < len(menu_options) - 1: selected_menu += 1 elif mode == "link" and selected_link < len(links) - 1: selected_link += 1 elif mode == "form" and selected_form < len(forms) - 1: selected_form += 1 elif key == ord('\n'): if mode == "menu": if selected_menu == 0: favorites_menu(stdscr, favorites, current_url) elif selected_menu == 1: current_url = input_box(stdscr, "请输入网址：") page_text = fetch_page(current_url) if current_url not in history: history.append(current_url) if len(history) > 20: history.pop(0) save_list(HISTORY_FILE, history) links, forms = extract_links_and_forms(page_text, current_url) elif selected_menu == 2: selected_url = history_menu(stdscr, history) if selected_url: current_url = selected_url page_text = fetch_page(current_url) links, forms = extract_links_and_forms(page_text, current_url) elif mode == "link": if links: current_url = links[selected_link][0] page_text = fetch_page(current_url) links, forms = extract_links_and_forms(page_text, current_url) elif mode == "form": if forms: result = submit_form(stdscr, forms[selected_form]) page_text = result links, forms = extract_links_and_forms(page_text, current_url) return # 启动程序 if __name__ == "__main__": curses.wrapper(main) 这是我的纯文本浏览器的源代码。它报错了：Python 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license()" for more information. = RESTART: C:\Users\number one\Desktop\纯文本浏览器.py Traceback (most recent call last): File "C:\Users\number one\Desktop\纯文本浏览器.py", line 363, in <module> curses.wrapper(main) File "C:\Users\number one\AppData\Local\Programs\Python\Python311\Lib\curses\__init__.py", line 73, in wrapper stdscr = initscr() File "C:\Users\number one\AppData\Local\Programs\Python\Python311\Lib\curses\__init__.py", line 30, in initscr fd=_sys.__stdout__.fileno()) AttributeError: 'NoneType' object has no attribute 'fileno'