Python - xpath解析XML、HTML
绝对路径:HTML / body / div / a
相对路径: ./a
专业术语
树:整个HTML或XML结构
节点:HTML中的每个标签,XML中标签就是节点
根节点:树的第一个节点,HTML的根节点就是HTML标签
属性:节点属性(HTML中就是标签属性)
from lxml import etree
xpath解析xml
XML数据格式
json数据与XML数据时两种通用的数据格式,用于不同语言之间进行数据交流
将一个超市的商品数据进行传输:
json:
{
"name":"永辉超市",
"address":"肖家河",
"goods":[
{
"name":"泡面","price":3.5,"count":50}
{
"name":"火腿肠","price":3,"count":200}
{
"name":"矿泉水","price":2,"count":30}
]
}
XML:
<supermarket>
<name>永辉超市</name>
<address>肖家河</address>
<goodsList>
<goods name = "泡面" price = "3.5" count = "50"></goods>
<goods name = "火腿肠" price = "3" count = "200"></goods>
<goods name = "矿泉水" price = "2" count = "30"></goods>
</goodsList>
<workerList>
<cashier name = "张三" pay = "4000"></cashier>
<shoppingGuide name = "李四" pay = "3000"></shoppingGuide>
</workerList>
</supermarket>
- 准备数据
xml_data ="""
<supermarket>
<name>永辉超市</name>
<address>肖家河</address>
<goodsList>
<goods name = "泡面" price = "3.5" count = "50"></goods>
<goods name = "火腿肠" price = "3" count = "200"></goods>
<goods name = "矿泉水" price = "2" count = "30"></goods>
</goodsList>
<workerList>
<cashier name = "张三" pay = "4000"></cashier>
<shoppingGuide name = "李四" pay = "3000"></shoppingGuide>
</workerList>
</supermarket>
"""
- 创建树对象,并且获取数据的根节点
supermarket = etree.XML(xml_data)
-
获取标签(获取节点)
节点对象.xpath(路径)
a.写绝对路:不管xpath前面的节点对象是什么,路径从根节点开始写
写法:/绝对路径
cashier = supermarket.xpath('/supermarket/workerList/cashier')
b.相对路径:用.表示当前节点,xpath前面是谁,当前节点就是谁
…表示当前节点的上层节点
注意:./ 可省略
cashier = supermarket.xpath('./workerList/cashier')
print(cashier) #[<Element cashier at 0x1d4299ba980>]
cashier = supermarket.xpath('../workerList/cashier')
print(