Python --- Parse XML

最新推荐文章于 2025-06-01 09:02:35 发布

ChadPro

最新推荐文章于 2025-06-01 09:02:35 发布

阅读量3.2k

点赞数 1

CC 4.0 BY-SA版权

分类专栏： Python

本文链接：https://blog.youkuaiyun.com/xihuandiannao/article/details/85067792

Python 专栏收录该内容

3 篇文章

订阅专栏

Python Parse XML

Use ElementTree

0. Import

ElementTree 生来就是为了处理 XML ，它在 Python 标准库中有两种实现。一种是纯 Python 实现例如 xml.etree.ElementTree ，另外一种是速度快一点的 xml.etree.cElementTree 。你要记住：尽量使用 C 语言实现的那种，因为它速度更快，而且消耗的内存更少。如果你的电脑上没有 _elementtree 那么如下操作：

try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

举例我们的 xml 文件如下：

<?xml version="1.0"?>
<doc>
    <branch name="testing" hash="1cdf045c">
        text,source
    </branch>
    <branch name="release01" hash="f200013e">
        <sub-branch name="subrelease01">
            xml,sgml
        </sub-branch>
    </branch>
    <branch name="invalid">
    </branch>
</doc>

1. 加载并解析xml文件

import xml.etree.cElementTree as ET
tree = ET.ElementTree(file='doc1.xml')

2. 获取根节点

获取根节点并输出根节点一些属性

root = tree.getroot()
print root.tag, root.attrib

3. 遍历根节点

for child_of_root in root:
   print child_of_root.tag, child_of_root.attrib

也可以用角标形式进入一个子节点:

first_child = root[0]
second_child = root[1]

4. 寻找感兴趣的节点

for elem in tree.iter(tag='branch'):
   print elem.tag, elem.attrib

5. 解析VOC举例

xml文件如下:

<?xml version="1.0" encoding="utf-8"?>

<annotation>
  <folder>2018-09-10-145551</folder>
  <filename>2018-09-10-145551-0.jpg</filename>
  <source>
    <database>The VOC2007 Database</database>
    <annotation>PASCAL VOC2007</annotation>
  </source>
  <size>
    <width>1920</width>
    <height>1080</height>
    <depth>3</depth>
  </size>
  <segmented>1</segmented>
  <object>
    <name>human_red</name>
    <bndbox>
      <xmin>1087</xmin>
      <ymin>599</ymin>
      <xmax>1160</xmax>
      <ymax>734</ymax>
    </bndbox>
  </object>
</annotation>

解析代码:

tree = ET.ElementTree(file=xml_path)
root = tree.getroot()
folder = root[0]
filename = root[1]

real_path = join(file_dic[folder.text], filename.text)
img = cv2.imread(real_path)

for elem in tree.iter(tag='object'):
    name = elem[0]
    bndbox = elem[1]
    xmin = bndbox[0]
    ymin = bndbox[1]
    xmax = bndbox[2]
    ymax = bndbox[3]

    cv2.rectangle(img,(int(xmin.text),int(ymin.text)),(int(xmax.text),int(ymax.text)),(255,0,0),3)
    cv2.putText(img, name.text, (int(xmin.text),int(ymin.text)-9),cv2.FONT_HERSHEY_COMPLEX,2,(0,0,255),5)

More. 更多参考

传送门