XML、DOM、Java操作库

最新推荐文章于 2024-09-17 11:12:22 发布

Bloonow

最新推荐文章于 2024-09-17 11:12:22 发布

阅读量273

点赞数

文章标签： java dom xml

本文链接：https://blog.youkuaiyun.com/weixin_44030017/article/details/107503593

版权

XML、DOM、Java操作库

XML元素是从（且包括）开始标签直到（且包括）结束标签的部分。一个元素可以包含：其他元素、文本、属性或混合以上所有。例如：

<!-- book.xml -->
<bookstore>
	<book category="Classic">
    	<title>Harry Potter</title>
        <author>JK. Rowling</author>
        <year>2005</year>
        <price>29.99$</price>
    </book>
</bookstore>

XML文档必须包含根元素，它是所有其他元素的父元素。XML文档中的元素形成了一棵文档树，所有的元素都可以有子元素。XML DOM把XML文档视为一种树结构，这种树结构被称为节点树。

DOM（Document Object Model，文档对象模型），是W3C标准（Word Wide Web Consortium）。在DOM中，XML文档的每个成分都是一个节点。规定：整个文档是一个文档节点；每个xml元素是一个元素结点（Element）；每一个xml属性是一个属性节点（Attr）；包含在xml元素中的文本是文本节点（Text）。

需要注意的是，org.w3c.dom.Node也会把空白换行符（即回车）当作一个文本节点（Text），因此在解析时会发现getLength()比所看到的节点数要大，且遍历时会莫名其妙的打印空行等问题。在使用时注意用正确的方法处理这些无用的节点。

在javax.xml.parsers和org.w3c.dom库中提供了相应的操作：

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbf.newDocumentBuilder();
Document docu = dBuilder.parse(new File("book.xml"));

// 获得所有<bookstore>元素节点的列表，因为此处就一个，故List长度为1
NodeList bookstoreNodeList = docu.getElementsByTagName("bookstore");
Node bookstoreNode = bookstoreNodeList.item(0);	// 取得第一个<bookstore>元素节点

// 获得第一个<bookstore>元素节点所有孩子节点组成的列表，其中有<title>、<author>、<year>、<price>元素节点
NodeList allChildsList = bookstoreNode.getChildNodes();
Node titleNode = allChildsList.item(0);		// 获得<title>元素节点
String value = titleNode.getNodeValue();	// 此处为 Harry Potter

Document.getElementsByTagName(String tag)，返回所有以tag为名称的节点list，不分XML树结构。
NodeList.item(int index)，从NodeList中获得index上的节点Node，索引index从0开始。
Node.getChildNodes()，获得当前节点的所有孩子节点。另外还有getFirstChild()、getLastChild()等方法。
Node.getTextContent()，获得节点的文本内容，通常使用。
Node.getNodeValue()，若当前节点值。文本节点（Text）返回文本内容；属性节点（Attr）返回属性值；元素节点（Element）总是返回null。
Node.getNodeName()，返回当前节点名称。元素节点（Element）返回元素名；属性节点（Attr）返回属性名；文本节点（Text）返回的都是#text。
Node.getNodeType() == Node.ELEMENT_NODE，获得节点的类型，可以跟预定义常量比较来判断是否是某一类型的节点。

在这里插入图片描述

值得注意的是，可能会存在所要打开的xml文件的编码格式无法匹配dom操作的情况，可以指定文件的编码如下。

InputStream is = new FileInputStream("task.xml");
Reader reader = new InputStreamReader(is, StandardCharsets.UTF_8);
InputSource iSource = new InputSource(reader);
iSource.setEncoding("UTF-8");

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbf.newDocumentBuilder();
Document docu = dBuilder.parse(iSource);