BeautfulSoup
用于解析HTML或XML
pip install beautifulsoup4
import bs4
步骤
1.创建BeautifulSoup对象
2查询节点
find,找到第一个满足条件的节点
find_all,找到所有满足节点的节点
创建BeautifulSoup对象
bs = BeautifulSoup(
url,
html_parser,指定解析器
encoding 指定编码格式(确保和网页编码格式一致)
)
查找节点
<a href = 'a.html' class = 'a_link'>
next page
</a>
可按 节点类型、属性或内容访问
按类型查找节点
bs.find_all('a')
按属性查找节点
bs.find_all('a',href = 'a.html')
bs.find_all('a',href='a.html',string='next page')
bs.find_all('a',calss = 'a_link')
注意:是class
或者bs.find_all('a',{'class':'a_link'})
# -*- coding:utf-8 -*-
import requests
from bs4 import BeautifulSoup
def get_city_aqi(city_pinyin):
"""
获取城市的AQI
:param city_pinyin:
:return:
"""
url = 'http://pm25.in/' + city_pinyin
r = requests.get(url,timeout = 30 )
soup = BeautifulSoup(r.text,'lxml')
div_list = soup.find_all('div',{'class':'span1'})
city_aqi = []
for i in range(8):
div_content = div_list[i]
caption = div_content.find('div',{'class':'caption'}).text.strip()
value = div_content.find('div',{'class':'value'}).text.strip()
city_aqi.append((caption,value))
return city_aqi
def main():
city_pinyin = raw_input('请输入城市拼音')
print(city_pinyin)
city_aqi = get_city_aqi(city_pinyin)
print(city_aqi)
if __name__ == '__main__':
main()

本文介绍如何使用BeautifulSoup库解析HTML或XML文档,包括安装、基本用法和示例代码,展示如何创建对象、查询节点及按类型、属性或内容访问。
1065

被折叠的 条评论
为什么被折叠?



