xml.etree.ElementTree用于解析和构建XML文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | <?xml version= "1.0" ?> <data> <country name= "Liechtenstein" > <rank>1< /rank > <year>2008< /year > <gdppc>141100< /gdppc > <neighbor name= "Austria" direction= "E" /> <neighbor name= "Switzerland" direction= "W" /> < /country > <country name= "Singapore" > <rank>4< /rank > <year>2011< /year > <gdppc>59900< /gdppc > <neighbor name= "Malaysia" direction= "N" /> < /country > <country name= "Panama" > <rank>68< /rank > <year>2011< /year > <gdppc>13600< /gdppc > <neighbor name= "Costa Rica" direction= "W" /> <neighbor name= "Colombia" direction= "E" /> < /country > < /data > |
解析XML文件
parse()函数,从xml文件返回ElementTree
1 2 3 | from xml.etree.ElementTree import parse tree = parse( 'demo.xml' ) / / 获取ElementTree root = tree.getroot() / / 获取根元素 |
Element.tag 、Element.attrib、Element.text
1 2 3 4 5 6 7 8 | In [ 6 ]: root.tag Out[ 6 ]: 'data' In [ 7 ]: root.attrib Out[ 7 ]: {} In [ 25 ]: root.text Out[ 25 ]: '\n ' |
for child in root 迭代获得子元素
1 2 3 4 5 6 | In [ 8 ]: for child in root: ...: print (child.tag, child.attrib) ...: country { 'name' : 'Liechtenstein' } country { 'name' : 'Singapore' } country { 'name' : 'Panama' } |
Element.get() 获得属性值
1 2 3 4 5 6 | In [ 27 ]: for child in root: ...: print (child.tag, child.get( 'name' )) ...: country Liechtenstein country Singapore country Panama |
root.getchildren() 获得直接子元素
1 2 3 4 5 | In [ 21 ]: root.getchildren() Out[ 21 ]: [<Element 'country' at 0x7f673581c728 >, <Element 'country' at 0x7f673581ca98 >, <Element 'country' at 0x7f673581cc28 >] |
root[0][1] 根据索引查找子元素
1 2 3 4 5 | In [ 9 ]: root[ 0 ][ 1 ].text Out[ 9 ]: '2008' In [ 10 ]: root[ 1 ][ 0 ].text Out[ 10 ]: '4' |
root.find() 根据tag查找直接子元素,返回查到的第一个元素
1 2 | In [ 13 ]: root.find( 'country' ).attrib Out[ 13 ]: { 'name' : 'Liechtenstein' } |
root.findall() 根据tag查找直接子元素,返回查到的所有元素的列表
1 2 3 4 5 6 | In [ 16 ]: for country in root.findall( 'country' ): ...: print (country.attrib) ...: { 'name' : 'Liechtenstein' } { 'name' : 'Singapore' } { 'name' : 'Panama' } |
root.iterfind() 根据tag查找直接子元素,返回查到的所有元素的生成器
1 2 | In [ 22 ]: root.iterfind( 'country' ) Out[ 22 ]: <generator object prepare_child.< locals >.select at 0x7f6736dccfc0 > |
支持的XPath语句(XML Path)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | In [ 19 ]: root.findall( './/rank' ) / / 查找任意层次元素 Out[ 19 ]: [<Element 'rank' at 0x7f673581c8b8 >, <Element 'rank' at 0x7f673581c6d8 >, <Element 'rank' at 0x7f673581cc78 >] In [ 32 ]: root.findall( 'country/*' ) / / 查找孙子节点元素 Out[ 32 ]: [<Element 'rank' at 0x7f673581c8b8 >, <Element 'year' at 0x7f673581cbd8 >, <Element 'gdppc' at 0x7f673581c958 >, <Element 'neighbor' at 0x7f673581c688 >, <Element 'neighbor' at 0x7f673581cb38 >, <Element 'rank' at 0x7f673581c6d8 >, <Element 'year' at 0x7f673581c5e8 >, <Element 'gdppc' at 0x7f673581c868 >, <Element 'neighbor' at 0x7f673581cb88 >, <Element 'rank' at 0x7f673581cc78 >, <Element 'year' at 0x7f673581ccc8 >, <Element 'gdppc' at 0x7f673581cd18 >, <Element 'neighbor' at 0x7f673581cd68 >, <Element 'neighbor' at 0x7f673581cdb8 >] In [ 33 ]: root.findall( './/rank/..' ) / / ..表示父元素 Out[ 33 ]: [<Element 'country' at 0x7f673581c728 >, <Element 'country' at 0x7f673581ca98 >, <Element 'country' at 0x7f673581cc28 >] In [ 34 ]: root.findall( 'country[@name]' ) / / 包含name属性的country Out[ 34 ]: [<Element 'country' at 0x7f673581c728 >, <Element 'country' at 0x7f673581ca98 >, <Element 'country' at 0x7f673581cc28 >] In [ 35 ]: root.findall( 'country[@name="Singapore"]' ) / / name属性为Singapore的country Out[ 35 ]: [<Element 'country' at 0x7f673581ca98 >] In [ 36 ]: root.findall( 'country[rank]' ) / / 孩子元素中包含rank的country Out[ 36 ]: [<Element 'country' at 0x7f673581c728 >, <Element 'country' at 0x7f673581ca98 >, <Element 'country' at 0x7f673581cc28 >] In [ 37 ]: root.findall( 'country[rank="68"]' ) / / 孩子元素中包含rank且rank元素的text为 68 的country Out[ 37 ]: [<Element 'country' at 0x7f673581cc28 >] In [ 38 ]: root.findall( 'country[1]' ) / / 第一个country Out[ 38 ]: [<Element 'country' at 0x7f673581c728 >] In [ 39 ]: root.findall( 'country[last()]' ) / / 最后一个country Out[ 39 ]: [<Element 'country' at 0x7f673581cc28 >] In [ 40 ]: root.findall( 'country[last()-1]' ) / / 倒数第二个country Out[ 40 ]: [<Element 'country' at 0x7f673581ca98 >] |
root.iter() 递归查询指定的或所有子元素
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | In [ 29 ]: root. iter () Out[ 29 ]: <_elementtree._element_iterator at 0x7f67355dd728 > In [ 30 ]: list (root. iter ()) Out[ 30 ]: [<Element 'data' at 0x7f673581c778 >, <Element 'country' at 0x7f673581c728 >, <Element 'rank' at 0x7f673581c8b8 >, <Element 'year' at 0x7f673581cbd8 >, <Element 'gdppc' at 0x7f673581c958 >, <Element 'neighbor' at 0x7f673581c688 >, <Element 'neighbor' at 0x7f673581cb38 >, <Element 'country' at 0x7f673581ca98 >, <Element 'rank' at 0x7f673581c6d8 >, <Element 'year' at 0x7f673581c5e8 >, <Element 'gdppc' at 0x7f673581c868 >, <Element 'neighbor' at 0x7f673581cb88 >, <Element 'country' at 0x7f673581cc28 >, <Element 'rank' at 0x7f673581cc78 >, <Element 'year' at 0x7f673581ccc8 >, <Element 'gdppc' at 0x7f673581cd18 >, <Element 'neighbor' at 0x7f673581cd68 >, <Element 'neighbor' at 0x7f673581cdb8 >] In [ 31 ]: list (root. iter ( 'rank' )) Out[ 31 ]: [<Element 'rank' at 0x7f673581c8b8 >, <Element 'rank' at 0x7f673581c6d8 >, <Element 'rank' at 0x7f673581cc78 >] |