Python Parse XML
Use ElementTree
0. Import
ElementTree 生来就是为了处理 XML ,它在 Python 标准库中有两种实现。一种是纯 Python 实现例如 xml.etree.ElementTree ,另外一种是速度快一点的 xml.etree.cElementTree 。你要记住: 尽量使用 C 语言实现的那种,因为它速度更快,而且消耗的内存更少。如果你的电脑上没有 _elementtree 那么如下操作:
try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
举例我们的 xml 文件如下:
<?xml version="1.0"?>
<doc>
<branch name="testing" hash="1cdf045c">
text,source
</branch>
<branch name="release01" hash="f200013e">
<sub-branch name="subrelease01">
xml,sgml
</sub-branch>
</branch>
<branch name="invalid">
</branch>
</doc>
1. 加载并解析xml文件
import xml.etree.cElementTree as ET
tree = ET.ElementTree(file='doc1.xml')
2. 获取根节点
获取根节点并输出根节点一些属性
root = tree.getroot()
print root.tag, root.attrib
3. 遍历根节点
for child_of_root in root:
print child_of_root.tag, child_of_root.attrib
也可以用角标形式进入一个子节点:
first_child = root[0]
second_child = root[1]
4. 寻找感兴趣的节点
for elem in tree.iter(tag='branch'):
print elem.tag, elem.attrib
5. 解析VOC举例
xml文件如下:
<?xml version="1.0" encoding="utf-8"?>
<annotation>
<folder>2018-09-10-145551</folder>
<filename>2018-09-10-145551-0.jpg</filename>
<source>
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
</source>
<size>
<width>1920</width>
<height>1080</height>
<depth>3</depth>
</size>
<segmented>1</segmented>
<object>
<name>human_red</name>
<bndbox>
<xmin>1087</xmin>
<ymin>599</ymin>
<xmax>1160</xmax>
<ymax>734</ymax>
</bndbox>
</object>
</annotation>
解析代码:
tree = ET.ElementTree(file=xml_path)
root = tree.getroot()
folder = root[0]
filename = root[1]
real_path = join(file_dic[folder.text], filename.text)
img = cv2.imread(real_path)
for elem in tree.iter(tag='object'):
name = elem[0]
bndbox = elem[1]
xmin = bndbox[0]
ymin = bndbox[1]
xmax = bndbox[2]
ymax = bndbox[3]
cv2.rectangle(img,(int(xmin.text),int(ymin.text)),(int(xmax.text),int(ymax.text)),(255,0,0),3)
cv2.putText(img, name.text, (int(xmin.text),int(ymin.text)-9),cv2.FONT_HERSHEY_COMPLEX,2,(0,0,255),5)