一.安装
pip3 install lxml
二.导入与实例化
导入
from lxml import etree
实例化
html=etree.parse(fliepath)
网页对象(page_text为requests请求获得)
html=etree.HTML(page_text)
三.XPath规则
代码实例
page_text='''
<div>
<ul>
<li class="item-0"><a href="link1.html">first item</a></li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-inactive"><a href="link3.html">third item</a></li>
<li class="item-1"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a>
</ul>
</div>
'''
1.所有节点
result=html.xpath('//*')
#运行结果
[<Element html at 0x7fe3287b7b40>, <Element body at 0x7fe31881bb80>, <Element div at 0x7fe31881bb40>, <Element ul at 0x7fe31881ba80>, <