python 读取xml获取包含指定标签的数据_使用python在xml中使用特定的匹配字符串解析子标签...

最新推荐文章于 2023-04-18 22:40:10 发布

weixin_39559079

最新推荐文章于 2023-04-18 22:40:10 发布

阅读量1.1k

点赞数

CC 4.0 BY-SA版权

文章标签： python 读取xml获取包含指定标签的数据

本文链接：https://blog.youkuaiyun.com/weixin_39559079/article/details/113966688

本文介绍如何使用Python解析XML文件，特别是寻找并提取所有以'Topic'为父标签的子标签。通过使用xml.etree.cElementTree库，通过检查标签属性是否以特定名称空间开始来筛选出目标元素。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

我想解析具有标签Topics作为父标签和Topic1，Topic2作为子标签的xml字符串。

<?xml version="1.0" encoding="UTF-8"?>Regulatory / Company InvestigationMergers & AcquisitionsLitigation / RegulatoryOwnership / Control

我只想解析这个xml，以便我可以获取每个Topic标签的属性值，我只希望它位于for循环中。

我已经尝试使用以下代码：

import xml.etree.cElementTree as ET

tree = ET.ElementTree(file='sample.xml')

#get the root element

root = tree.getroot()

namespace = {'xmlns': 'urn:reuterscompanycontent:significantdevelopments03'}

for devs in root.findall('xmlns:Topics' ,namespace):

for child_tags in devs.findall('xmlns:./', namespace):

print 'child: ', child_tags.tag

我只想在倒数第二行中添加一些类似于Topic / d的通配符，以便我可以解析与主题匹配的每个标签

解决方案

您可以检查tag属性是否以名称空间加上前缀开头Topic，例如

from xml.etree import cElementTree as ET

root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?>Regulatory / Company InvestigationMergers & AcquisitionsLitigation / RegulatoryOwnership / Control')

topics = [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')]

for topic in topics:

print (topic.text)

或更短为

from xml.etree import cElementTree as ET

root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?>Regulatory / Company InvestigationMergers & AcquisitionsLitigation / RegulatoryOwnership / Control')

for topic in [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')]:

print (topic.text)

或将支票放入if报表内的for报表中。