python_BeautifulSoup_beautifulsoup.py-优快云博客

本文链接：https://blog.youkuaiyun.com/Py_CCY/article/details/74081407

本文介绍如何使用Python的BeautifulSoup库解析HTML文档。通过实例演示了如何获取网页内容，并介绍了BeautifulSoup类的基本元素如Tag、Name、Attributes及NavigableString等。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

博客内容都是经过看视频总结的如果视频中讲师觉得有抄袭，请联系我

Beautiful Soup库的理解：

例子：

The demo python introduces several python courses.

标签对，英文名字叫Tag

p 是标签的名字

class=‘’‘title’ 属性Attributes

Beautiful Soup 就是将html 标签树进行BeautifulSoup 的处理

Beautiful Soup 库的解析器：

bs4的Html 解析器，使用的方法 BeautifulSoup(mk,'html.parser') , 条件安装 bs4库

lxml的Html 解析器，使用的方法 BeautifulSoup(mk,'lxml') , 条件pip install xml

lxml的xml 解析器，使用的方法 BeautifulSoup(mk,'xml') , 条件pip install xml

html5lib的解析器，使用的方法 BeautifulSoup(mk,'html5lib') , 条件pip install html5lib

案例：

import requests
from bs4 import BeautifulSoup#调用soup库
try:
    r=requests.get("http://python123.io/ws/demo.html",timeout=30)
    r.raise_for_status()
    r.encoding=r.apparent_encoding
    demo=r.text
    #解析html所以相当于解析demo，使用beautifulSoup解析
    soup=BeautifulSoup(demo,'html.parser')#解析
    print(soup)

except:
    print("有故障")

Beautiful Soup 类的基本元素

Tag 标签，最基本的信息组织单元，分别用<></>标明开头结尾

from bs4 import BeautifulSoup
import requests
r = requests.get("http://python123.io/ws/demo.html", timeout=30)
demo=r.text
soup=BeautifulSoup(demo,'html.parser')
#打印文本title 标签
print(soup.title)
#打印文本中的a标签
print(soup.a)

Name 标签的名字，的名字就是‘p’格式是<tag>.name

from bs4 import BeautifulSoup
import requests
r = requests.get("http://python123.io/ws/demo.html", timeout=30)
demo=r.text
soup=BeautifulSoup(demo,'html.parser')
#分别打印了a标签的name，a标签他爸的name，a标签他祖宗的name
print(soup.a.name)
print(soup.a.parent.name)
print(soup.a.parent.parent.name)

Attributes 标签的属性，字典形式组织，格式<tag>.attrs

from bs4 import BeautifulSoup
import requests
r = requests.get("http://python123.io/ws/demo.html", timeout=30)
demo=r.text
soup=BeautifulSoup(demo,'html.parser')
#标签的属性 先告诉那个标签的什么属性
tag=soup.a#a标签下的属性<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">
print(tag.attrs)#打印结果{'href': 'http://www.icourse163.org/course/BIT-268001', 'id': 'link1', 'class': ['py1']print(tag.attrs.values())#字典打印对应的kv 值

NavigableString 标签内的非属性字符串，<>.......<>格式<tag>.string