Python---XML解析

最新推荐文章于 2024-04-15 15:35:05 发布

原创最新推荐文章于 2024-04-15 15:35:05 发布 · 599 阅读

1 ·

CC 4.0 BY-SA版权

本文介绍了XML的基本概念，强调了其在数据交互和配置应用中的作用。通过Python的ElementTree库，展示了如何加载XML文件到内存，获取根节点以及解析和操作XML文档的过程。

部署运行你感兴趣的模型镜像

什么是XML

XML是一种可扩展标记语言非常像HTML或SGML的标记语言。这是由万维网联盟推荐的，可以作为开放标准。XML对于存储小到中等数量的数据非常有用，而不需要使用SQL。

作用:数据交互配置应用程序和网站节点自由拓展

特点: XML与操作系统编程语言的开发平台无关实现不同系统之间的数据转换

首先准备一份XML格式的文件

<?xml version="1.0" encoding='utf-8'?>
<data name="XML">
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

第一步当然是导入ElementTree

import xml.etree.ElementTree as ET

然后加载文档到内存里形成一个倒桩的树结构

tree=ET.parse('XML.xml')

其次就是获取根节点

root=tree.getroot()

最后就可以解析字符串,根据需求拿到需要的字段

查找指定子节点及XML文件的修改及保存

XML={}
print('tag:',root.tag,'attrib:',root.attrib,'text:',root.text)
for ele in root:
    print('tag:',ele.tag,'attrib:',ele.attrib)
    value=[]
    for e in ele:
        # print('tag:',e.tag,'attrib:',e.attrib,'text',e.text)
        if e.text is None:
            value.append(e.attrib)
        else:
            value.append({e.tag:e.text})
    XML[ele.attrib['name']]=value
print(XML)
node=root.find('country')#查找root节点下第一个tag为country的节点
print(node.attrib['name'])
nodes=root.findall('country')
for node in nodes:
    if node.attrib['name']=='Liechtenstein':
        root.remove(node)
        break
tree.write('mingbai.xml')#保存修改后的XML文件
print('删除完成')

parse()方法
以下方法创建一个SAX解析器并使用它来解析文档。

xml.sax.parse( xmlfile, contenthandler[, errorhandler])

准备一份XML格式的文件

<?xml version="1.0" encoding='utf-8'?>
<collection shelf = "New Arrivals">
<movie title = "Enemy Behind">
   <type>War, Thriller</type>
   <format>DVD</format>
   <year>2013</year>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Talk about a US-Japan war</description>
</movie>
<movie title = "Transformers">
   <type>Anime, Science Fiction</type>
   <format>DVD</format>
   <year>1989</year>
   <rating>R</rating>
   <stars>8</stars>
   <description>A schientific fiction</description>
</movie>
   <movie title = "Trigun">
   <type>Anime, Action</type>
   <format>DVD</format>
   <episodes>4</episodes>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Vash the Stampede!</description>
</movie>
<movie title = "Ishtar">
   <type>Comedy</type>
   <format>VHS</format>
   <rating>PG</rating>
   <stars>2</stars>
   <description>Viewable boredom</description>
</movie>
</collection>

解析文件

import xml.sax

class MovieHandler( xml.sax.ContentHandler ):
   def __init__(self):
      self.CurrentData = ""
      self.type = ""
      self.format = ""
      self.year = ""
      self.rating = ""
      self.stars = ""
      self.description = ""
   def startElement(self, tag, attributes):
      self.CurrentData = tag
      if tag == "movie":
         print ("*****Movie*****")
         title = attributes["title"]
         print ("Title:", title)
   def endElement(self, tag):
      if self.CurrentData == "type":
         print ("Type:", self.type)
      elif self.CurrentData == "format":
         print ("Format:", self.format)
      elif self.CurrentData == "year":
         print ("Year:", self.year)
      elif self.CurrentData == "rating":
         print ("Rating:", self.rating)
      elif self.CurrentData == "stars":
         print ("Stars:", self.stars)
      elif self.CurrentData == "description":
         print ("Description:", self.description)
      self.CurrentData = ""     #清空缓冲区
   def characters(self, content):
      if self.CurrentData == "type":
         self.type = content
      elif self.CurrentData == "format":
         self.format = content
      elif self.CurrentData == "year":
         self.year = content
      elif self.CurrentData == "rating":
         self.rating = content
      elif self.CurrentData == "stars":
         self.stars = content
      elif self.CurrentData == "description":
         self.description = content

if ( __name__ == "__main__"):
   parser = xml.sax.make_parser()    #1.create an XMLReader
   parser.setFeature(xml.sax.handler.feature_namespaces, 0) #2.namepsaces 工作目录 工作空间 命名空间
   Handler = MovieHandler()
   parser.setContentHandler( Handler ) #覆盖其原来的ContextHandler
   parser.parse("movies.xml")

DVD管理系统

<?xml version="1.0" encoding='utf-8'?>
<dvds>
    <dvd>
        <name>不堪回首的往事</name>
        <price>300</price>
        <state>1</state>
    </dvd>
    <dvd>
        <name>北京一夜</name>
        <price>400</price>
        <state>0</state>
    </dvd>
    <dvd>
        <name>南山南</name>
        <price>500</price>
        <state>1</state>
    </dvd>



</dvds>

读取其初始数据

import xml.etree.ElementTree as ET
tree=ET.parse('测试.xml')
root=tree.getroot()
dvds={}
def getdvds():
    for dvd in root:
        for ele in dvd:
            n_dvd=DVD()
            for ele in dvd:
                if ele.tag=='name':
                    n_dvd.name=ele.text
                elif ele.tag=='price':
                    n_dvd.price=ele.text
                elif ele.tag=='state':
                    n_dvd.state=ele.text
            dvds[n_dvd.name]=n_dvd
        return dvds

您可能感兴趣的与本文相关的镜像

Python3.9

Conda

Python

Python 是一种高级、解释型、通用的编程语言，以其简洁易读的语法而闻名，适用于广泛的应用，包括Web开发、数据分析、人工智能和自动化脚本