我想解析具有标签Topics作为父标签和Topic1,Topic2作为子标签的xml字符串。
<?xml version="1.0" encoding="UTF-8"?>Regulatory / Company InvestigationMergers & AcquisitionsLitigation / RegulatoryOwnership / Control
我只想解析这个xml,以便我可以获取每个Topic标签的属性值,我只希望它位于for循环中。
我已经尝试使用以下代码:
import xml.etree.cElementTree as ET
tree = ET.ElementTree(file='sample.xml')
#get the root element
root = tree.getroot()
namespace = {'xmlns': 'urn:reuterscompanycontent:significantdevelopments03'}
for devs in root.findall('xmlns:Topics' ,namespace):
for child_tags in devs.findall('xmlns:./', namespace):
print 'child: ', child_tags.tag
我只想在倒数第二行中添加一些类似于Topic / d的通配符,以便我可以解析与主题匹配的每个标签
解决方案
您可以检查tag属性是否以名称空间加上前缀开头Topic,例如
from xml.etree import cElementTree as ET
root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?>Regulatory / Company InvestigationMergers & AcquisitionsLitigation / RegulatoryOwnership / Control')
topics = [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')]
for topic in topics:
print (topic.text)
或更短为
from xml.etree import cElementTree as ET
root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?>Regulatory / Company InvestigationMergers & AcquisitionsLitigation / RegulatoryOwnership / Control')
for topic in [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')]:
print (topic.text)
或将支票放入if报表内的for报表中。