Beautifulsoup的用法

最新推荐文章于 2021-03-12 12:55:56 发布

原创最新推荐文章于 2021-03-12 12:55:56 发布 · 403 阅读

0 ·

CC 4.0 BY-SA版权

Python 专栏收录该内容

17 篇文章

订阅专栏

本文介绍了一个使用Python的BeautifulSoup库解析HTML文档的例子。通过这个示例，读者可以了解到如何提取特定标签及其属性，例如获取<title>标签的内容、查找带有特定属性的标签以及获取这些标签内的文本。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

#coding:utf-8
from bs4 import BeautifulSoup
import re

doc = ['<html><head><title>Page title</title></head>',
'<body>This is paragraph one.',
'This is paragraph two.',
'</html>']
soup = BeautifulSoup(''.join(doc))

# <html>
# <head>
# <title>
# Page title
# </title>
# </head>
# <body>
# 
# This is paragraph
# 
# one
# 
# .
# 
# 
# This is paragraph
# 
# two
# 
# .
# 
# </body>
# </html>
tieleTag = soup.html.head.title

print tieleTag
#<title>Page title</title>
print tieleTag.string
#Page title
print len(soup('p'))
#获取p标签的个数
print soup.find('p',align="center")
#获取p标签align属性为center的语句
print soup('p',align="center")[0]['id']
#获取解析后第一个p标签的id
print soup.find('p').b.string #查找p标签的b标签的内容
print soup('p')[1].b.string #查找p标签的第二个b标签的内容
tieleTag['id'] = 'theTitle' #修改soup
soup.p.extract() #移除第一个p标签
print soup