dom4j是第三方公司封装的,dom、sax和stax是jdk提供的。
简单说下区别:dom和dom4j需要把文档都加载到内存中,所以对内存要求比较高
sax和stax不需要加载到内存中,可以实时读取XML。
所以sax和stax在解析大文件的XML时比较有效率。
若要对XML文档进行修改操作时可以选择dom4j和dom。
1. dom4j方法:
引入依赖:
<dependency>
<groupId>dom4j</groupId>
<artifactId>dom4j</artifactId>
<version>1.6.1</version>
</dependency>
自定义工具类:
package com.utils;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.DocumentHelper;
import org.dom4j.Element;
import java.util.HashMap;
import java.util.Map;
/**
* 解析XML
*/
public class ParseXml {
/**
* 解析数据列表的xml
* @param xmlStr
* @return
* @throws DocumentException
*/
public static String parseSearchXml(String xmlStr) throws DocumentException {
if (xmlStr==null){
return null;
}
String result=null;
//把返回的内容解析成XML
Document document= DocumentHelper.parseText(xmlStr);
//获取根节点
Element root;
root=document.getRootElement();
//一级一级的获取子节点的内容
result=root.element("eleName1").element("eleName2").element("eleName3").getText();
return result;
}
}
2.JDK自带方法 DOM:
2.1 文件读取
xml文件内容如下:
上图中xml包含<font>标签下包含了两种元素Text元素和Node元素,Text元素包含:<font>标签和<name>标签间的空白,</name>标签和<size>标签间的空白,</size>标签和</font>标签间的空白,Node元素包含<name>和<size>标签两个。
所以若调用下面的rootElement.getChildNodes()方法的长度是5而不是2,因为包含了Text元素,所以可能需要去除此种类型的元素。
DocumentBuilder documentBuilder=documentBuilderFactory.newDocumentBuilder();
Document document =documentBuilder.parse(new File("/work/xml.xml"));
//获取根元素
Element rootElement =document.getDocumentElement();
//获取该元素下的子元素
NodeList nodeList =rootElement.getChildNodes();
System.out.println(nodeList.getLength());
//NodeList的item方法获取每个Node元素
for (int i=0;i<nodeList.getLength();i++){
Node node= nodeList.item(i);
//<font>标签下包含Text和Node两种元素,去除空白元素
if (! (node instanceof Element)){
continue;
}
Element element=(Element) node;
//获取标签名
String tagName=element.getTagName();
//若确定该标签下只有一个Text元素时直接调用getFirstChild方法即可
Text text =(Text)element.getFirstChild();
//获取Text元素的值
String kk=text.getData().trim();
String jj=element.getTextContent();
}
} catch (ParserConfigurationException e) {
e.printStackTrace();
}catch (IOException e){
e.printStackTrace();
}catch (SAXException e){
e.printStackTrace();
}catch (DocumentException e){
e.printStackTrace();
}
以上两种方法的区别:dom4j的 DocumentHelper.parseText( ...)方法解析的是xml的字符串格式,而 documentBuilder.parse(...)方法可以是URI、FIile或者Inputstream类型的参数。
使用第二种方式时可以去除标签间空格字符的读取,如下:
使用<!DOCTYPE>标签对xml进行校验,
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE font[
<!ELEMENT font (name,size)>
]>
<font>
<name>黑体</name>
<size>38</size>
</font>
同时开启属性:
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
// 忽略标签间的空白符
documentBuilderFactory.setIgnoringElementContentWhitespace(true);
。
。
。
这样就不用再下下面的代码进行空白符号的验证了:
if (! (node instanceof Element)){
continue;
}
2.2 创建或给XML添加额外的内容
@Test
public void test29() throws ParserConfigurationException, TransformerException, FileNotFoundException {
//创建dom
DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder =factory.newDocumentBuilder();
Document document =documentBuilder.newDocument();
Element rootElement =document.createElement("root");
Element fontElement=document.createElement("font");
Element partElement=document.createElement("part");
Element nameElement=document.createElement("name");
Element ageElement=document.createElement("age");
Text nameText =document.createTextNode("zhang3");
Text ageText =document.createTextNode("18");
//添加root标签
//否则XML输出内容为空
document.appendChild(rootElement);
rootElement.appendChild(partElement);
partElement.appendChild(fontElement);
fontElement.appendChild(nameElement);
fontElement.appendChild(ageElement);
nameElement.appendChild(nameText);
ageElement.appendChild(ageText);
//写出XML
TransformerFactory transformerFactory=TransformerFactory.newInstance();
Transformer transformer=transformerFactory.newTransformer();
//添加<!DOCTYPE> 标签
transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC,"PUBLIC");
transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM,"SYSTEMID");
//输出结果树时添加额外的空白
transformer.setOutputProperty(OutputKeys.INDENT,"yes");
transformer.setOutputProperty(OutputKeys.METHOD,"xml");
// transformer.setOutputProperty("{http://xmlapache.org/xslt}indent-amount","2");
transformer.transform(new DOMSource(document),new StreamResult(new FileOutputStream(new File("/work/axs.xml"))));
}
结果XML文件输出:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE root PUBLIC "PUBLIC" "SYSTEMID">
<root>
<part>
<font>
<name>zhang3</name>
<age>18</age>
</font>
</part>
</root>
二、一般情况下并非是命名空间感知的,需要手动开启
//开启命名空间感知
documentBuilderFactory.setNamespaceAware(true);
XML示例:
<?xml version="1.0" encoding="UTF-8"?>
<!-- 此标签的意思是 font标签下包含name和size两个标签,可以使用正则,每个XML规则都用<!ELEMENT>标签表示-->
<!DOCTYPE font[
<!ELEMENT font (name,size)>
]>
<xsd:font xmlns:xsd="http://baidu.com">
<part>
<name title="this is title">
<value>黑体</value>
<age>11</age>
</name>
<size>38</size>
<city>上海</city>
<fangxiang>东南</fangxiang>
</part>
<part>
<name>宋体</name>
<size>23</size>
</part>
<part>
<name>隶书</name>
<size>43</size>
</part>
</xsd:font>
给<font>标签添加了如上的命名空间,对以上命名空间的解释:
"xmlns:xsd":命名空间的前缀是xsd,命名空间的值是:http://baidu.com
即“命名空间是http://baidu.com中的font”
标签获取:
Element element=document.getDocumentElement();
//获取标签名
System.out.println(element.getTagName());
//获取localName
System.out.println(element.getLocalName());
//获取namespace
System.out.println(element.getNamespaceURI());
输出:
xsd:font
font
http://baidu.com
只有把
documentBuilderFactory.setNamespaceAware(true);设置为true时getLocalName和getNamespaceURI方法才会生效否则返回false。
3.流制解析器
SAX解析器:
以上述XML为例。
DOM需要读入完整的XML文档,解析成完整的树形结构,当文档很大且不必关心完整属性结构,可以在运行时解析节点,则可以选择流制解析器。
@Test
public void test25() throws ParserConfigurationException, SAXException, IOException {
DefaultHandler defaultHandler=new DefaultHandler(){
public void startElement (String uri, String localName,
String qName, Attributes attributes)
throws SAXException
{
int a=attributes.getLength();
if (localName.equals("name") && attributes!=null){
for (int i=0;i<attributes.getLength();i++){
String name=attributes.getLocalName(i);
String b=attributes.getURI(i);
String c=attributes.getQName(i);
if (name.equals("title")){
System.out.println(attributes.getValue(i));
}
}
}
}
};
SAXParserFactory saxParserFactory=SAXParserFactory.newInstance();
saxParserFactory.setNamespaceAware(true);
saxParserFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd",false);
SAXParser saxParser =saxParserFactory.newSAXParser();
saxParser.parse("/work/xml.xml",defaultHandler);
}
参数意义:
当命名空间打开时(saxParserFactory.setNamespaceAware(true)):
uri是命名空间,有值,不为空;localName是标签名(不包含命名空间前缀);qname是标签名(包含命名空间前缀)
当命名空间关闭时(saxParserFactory.setNamespaceAware(false)):
uri是命名空间,为空;localName为空;qname是标签名(无前缀)
//返回给定索引的标签名
String name=attributes.getLocalName(i);
//返回给定索引的命名空间
String b=attributes.getURI(i);
//返回给定索引的属性名称
String c=attributes.getQName(i);
//返回给定索引的属性值
String d=attributes.value(i);
STAX解析器:
public void test26() throws IOException, XMLStreamException {
String url="http://www.w3c.org";
InputStream in=new URL(url).openStream();
XMLInputFactory xmlInputFactory=XMLInputFactory.newInstance();
//是否对XML格式进行校验
xmlInputFactory.setProperty(XMLInputFactory.IS_VALIDATING,false);
//是否开启命名空间,默认是true
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE,false);
XMLStreamReader parse =xmlInputFactory.createXMLStreamReader(in);
while (parse.hasNext()){
int i=parse.next();
//开始标签
if (i==XMLStreamConstants.START_ELEMENT){
QName qName =parse.getName();
String localName=parse.getLocalName();
if ("a".equals(localName)){
String value=parse.getAttributeValue(null,"href");
if (value!=null){
System.out.println(value);
}
}
}
}
}
源码中事件类:
/*
* ORACLE PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*/
package javax.xml.stream;
/**
* This interface declares the constants used in this API.
* Numbers in the range 0 to 256 are reserved for the specification,
* user defined events must use event codes outside that range.
*
* @since 1.6
*/
public interface XMLStreamConstants {
/**
* Indicates an event is a start element
* @see javax.xml.stream.events.StartElement
*/
public static final int START_ELEMENT=1;
/**
* Indicates an event is an end element
* @see javax.xml.stream.events.EndElement
*/
public static final int END_ELEMENT=2;
/**
* Indicates an event is a processing instruction
* @see javax.xml.stream.events.ProcessingInstruction
*/
public static final int PROCESSING_INSTRUCTION=3;
/**
* Indicates an event is characters
* @see javax.xml.stream.events.Characters
*/
public static final int CHARACTERS=4;
/**
* Indicates an event is a comment
* @see javax.xml.stream.events.Comment
*/
public static final int COMMENT=5;
/**
* The characters are white space
* (see [XML], 2.10 "White Space Handling").
* Events are only reported as SPACE if they are ignorable white
* space. Otherwise they are reported as CHARACTERS.
* @see javax.xml.stream.events.Characters
*/
public static final int SPACE=6;
/**
* Indicates an event is a start document
* @see javax.xml.stream.events.StartDocument
*/
public static final int START_DOCUMENT=7;
/**
* Indicates an event is an end document
* @see javax.xml.stream.events.EndDocument
*/
public static final int END_DOCUMENT=8;
/**
* Indicates an event is an entity reference
* @see javax.xml.stream.events.EntityReference
*/
public static final int ENTITY_REFERENCE=9;
/**
* Indicates an event is an attribute
* @see javax.xml.stream.events.Attribute
*/
public static final int ATTRIBUTE=10;
/**
* Indicates an event is a DTD
* @see javax.xml.stream.events.DTD
*/
public static final int DTD=11;
/**
* Indicates an event is a CDATA section
* @see javax.xml.stream.events.Characters
*/
public static final int CDATA=12;
/**
* Indicates the event is a namespace declaration
*
* @see javax.xml.stream.events.Namespace
*/
public static final int NAMESPACE=13;
/**
* Indicates a Notation
* @see javax.xml.stream.events.NotationDeclaration
*/
public static final int NOTATION_DECLARATION=14;
/**
* Indicates a Entity Declaration
* @see javax.xml.stream.events.NotationDeclaration
*/
public static final int ENTITY_DECLARATION=15;
}