Python bs4 解析不对, 更新bs4 库或单独调用 Scrapy 的 css 选择器

最新推荐文章于 2025-06-23 23:56:34 发布

原创最新推荐文章于 2025-06-23 23:56:34 发布 · 1k 阅读

1 ·

CC 4.0 BY-SA版权

Python 同时被 3 个专栏收录

110 篇文章

订阅专栏

scrapy

16 篇文章

订阅专栏

requests

8 篇文章

订阅专栏

本文介绍了解决bs4解析特定CSS选择器失败的问题，通过升级bs4库至4.7.0以上版本，并使用Scrapy的css选择器替代，成功解析了如'div.show-status>span:nth-child(3)'等规则。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

更新bs4库至4.7.0以上,然后就可以使用了。（推荐）

pip install --upgrade beautifulsoup4

比如这个规则,bs4 解析不出来, 会报错需要将nth-child改为 nth-of-type,

虽然这样不会报错了,但是解析的位置不对

div.show-status > span:nth-child(3)

解决方法:

调用Scrapy 的 css 选择器

import requests
from scrapy.selector import Selector

url = 'http://www.tchuan.gov.cn/show/2019/03/07/36310.html'


# ['h1.show-title', 'div.show-status > span:nth-child(3)', '#content']
html = requests.get(url)
a = Selector(text = html.text)
b = a.css('div.show-status > span:nth-child(3)').extract()[0]

print(a)
print(b)