Python网络爬虫实战 课时6 BeautifulSoup 基础操作(1)

本文介绍了如何使用Python的BeautifulSoup库来解析HTML文档并提取关键信息。通过三个实例展示了选择不同元素的方法,包括获取所有a标签及其文本内容。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1:

# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
html_sample = ' \
<html> \
<body> \
<h1 id = "title">Hello World</h1> \
<a href ="#" class="link">this is link1</a> \
<a href = "# link2" Class = "link">This is link2</a> \
</body> \
</html>'




soup = BeautifulSoup(html_sample,'html.parser')
header = soup.select('h1')
print(header)

print(header[0])

print(header[0].text)


运行结果为:

[<h1 id="title">Hello World</h1>]

<h1 id="title">Hello World</h1>

Hello World

2.

# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
html_sample = ' \
<html> \
<body> \
<h1 id = "title">Hello World</h1> \
<a href ="#" class="link">this is link1</a> \
<a href = "# link2" Class = "link">This is link2</a> \
</body> \
</html>'




soup = BeautifulSoup(html_sample,'html.parser')
alink = soup.select('a')
print(alink)
for link in alink:
    print(link)

运行结果为:

[<a class="link" href="#">this is link1</a>, <a class="link" href="# link2">This is link2</a>]


<a class="link" href="#">this is link1</a>

<a class="link" href="# link2">This is link2</a>


3.

# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
html_sample = ' \
<html> \
<body> \
<h1 id = "title">Hello World</h1> \
<a href ="#" class="link">this is link1</a> \
<a href = "# link2" Class = "link">This is link2</a> \
</body> \
</html>'




soup = BeautifulSoup(html_sample,'html.parser')
alink = soup.select('a')
print(alink)
for link in alink:
    #print(link)

    print(link.text)

运行结果为:

this is link1
This is link2

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值