高效解析XML是任何一个优秀的编码框架所必须包含的一块功能。在Java的世界当中,有三种处理XML的方式:DOM, SAX, StAX。网上对这三种解析模式也有了大量的说明。那么这三种解析方式在实际使用时到底各有什么特点呢?让我们通过三个实例来进行横向的比较。
首先我们创建一个xml文件,命名为 data.xml
:
1 | <? xml version = "1.0" encoding = "UTF-8" ?> |
3 | < greeting id = "g1" >Hello DOM</ greeting > |
4 | < greeting id = "g2" >Hello SAX</ greeting > |
5 | < greeting id = "g3" >Hello StAX</ greeting > |
首先我们用DOM方式来解析这个XML。DOM的特点是一次性把XML读进内存,并按下图所示DOM结构将XML数据映射成Java对象:

下面这段代码调用 org.w3c.dom.*
来解析xml:
03 | import javax.xml.parsers.DocumentBuilder; |
04 | import javax.xml.parsers.DocumentBuilderFactory; |
05 | import org.w3c.dom.Document; |
06 | import org.w3c.dom.Element; |
07 | import org.w3c.dom.NamedNodeMap; |
08 | import org.w3c.dom.Node; |
09 | import org.w3c.dom.NodeList; |
12 | public static void main(String args[]) throws Exception { |
13 | DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); |
14 | DocumentBuilder builder = factory.newDocumentBuilder(); |
15 | Document doc = builder.parse(“data.xml”); |
16 | Element elem = doc.getDocumentElement(); |
17 | NodeList list = elem.getChildNodes(); |
18 | for ( int i = 0 ; i < list.getLength(); i++) { |
19 | Node node = list.item(i); |
20 | NamedNodeMap attributes = node.getAttributes(); |
21 | if (attributes != null ) { |
22 | for ( int j = 0 ; j < attributes.getLength(); j++) { |
23 | Node attr = attributes.item(j); |
24 | System.out.println( "attr name: " + attr.getNodeName()); |
25 | System.out.println( "attr value: " + attr.getNodeValue()); |
28 | System.out.println( "node name: " + node.getNodeName()); |
29 | System.out.println( "node type: " + node.getNodeType()); |
30 | System.out.println( "node value: " + node.getNodeType()); |
31 | System.out.println( "content: " + node.getTextContent()); |
32 | System.out.println(“—————————”); |
代码输出结果如下:
请注意 node type: 3
是代表的是空格。DOM会把greeting元素间的空白也算做独立的内容。通过上述样例我们可以发现DOM的特点就是一次性把XML数据读入内存并按照DOM约定的结构创建相关的实例。对于尺寸较小的XML文件,使用DOM来进行解析还是非常方便的,但如果XML的文件尺寸比较大,用DOM方式进行解析的效率就比较低,对内存资源的浪费也比较大,因此我们需要以"流"1的方式来解析XML。SAX和StAX正是这样的工具。
首先来看SAX:
04 | import java.io.FileInputStream; |
05 | import java.io.InputStreamReader; |
07 | import javax.xml.parsers.SAXParser; |
08 | import javax.xml.parsers.SAXParserFactory; |
10 | import org.xml.sax.AttributeList; |
11 | import org.xml.sax.HandlerBase; |
12 | import org.xml.sax.InputSource; |
13 | import org.xml.sax.SAXException; |
15 | public class TrySAX extends HandlerBase { |
18 | public void characters( char [] ch, int start, int length) |
20 | String value = new String(ch, start, length); |
21 | if (!value.trim().equals( "" )) { |
22 | System.out.println( "Text: " + value); |
27 | public void endDocument() throws SAXException { |
28 | System.out.println( "End Document" ); |
33 | public void endElement(String name) throws SAXException { |
34 | System.out.println( "End Element:" + name); |
35 | super .endElement(name); |
39 | public void startDocument() throws SAXException { |
40 | System.out.println( "Start Document." ); |
41 | super .startDocument(); |
45 | public void startElement(String name, AttributeList attributes) |
47 | System.out.println( "Start Element: " + name); |
48 | for ( int i = 0 , n = attributes.getLength(); i < n; ++i) |
49 | System.out.println( "Attribute: " + attributes.getName(i) + "=" |
50 | + attributes.getValue(i)); |
51 | super .startElement(name, attributes); |
54 | public static void main(String args[]) throws Exception { |
55 | InputStreamReader reader = new InputStreamReader( new FileInputStream( |
56 | new File( "data.xml" ))); |
58 | InputSource source = new InputSource(reader); |
59 | HandlerBase handler = new TrySAX(); |
61 | SAXParserFactory factory = SAXParserFactory.newInstance(); |
62 | String parserClassName = "javax.xml.parsers.SAXParser" ; |
63 | SAXParser parser = factory.newSAXParser(); |
65 | parser.parse(source, handler); |
执行上述程序,结果输出如下:
02 | Start Element: greetings |
03 | Start Element: greeting |
07 | Start Element: greeting |
11 | Start Element: greeting |
SAX方式把XML用流的方式读入,并在把XML的相关元素分解成一系列事件。当遇见某一事件时,触发这个事件对应的方法。SAX的事件模型如下:

这样,我们在事件对应的方法中,撰写我们所需的业务处理逻辑即可。但这样写程序有点怪,我们的业务逻辑代码必须要封装到这些事件所在的 HandlerBase
中,而不是我们所期望的业务逻辑的Class当中。我们称这样的封装方法为“推送”2的方法。
那么有没有可能,我们不把业务逻辑放在事件方法中,而是我们调用 Handler
来处理XML呢?答案是有,StAX就是以后一种形式工作的。与SAX不同,StAX采用"拉"3的方法来处理XML。也是通过一段样例来说明StAX的使用方法:
03 | import java.io.FileInputStream; |
04 | import java.io.IOException; |
05 | import java.io.InputStream; |
07 | import javax.xml.stream.XMLInputFactory; |
08 | import javax.xml.stream.XMLStreamConstants; |
09 | import javax.xml.stream.XMLStreamException; |
10 | import javax.xml.stream.XMLStreamReader; |
12 | public class TryCursorMode { |
14 | private void parseXML() throws IOException, XMLStreamException { |
16 | InputStream in = new FileInputStream( "data.xml" ); |
17 | XMLInputFactory inFactory = XMLInputFactory.newInstance(); |
18 | XMLStreamReader r = inFactory.createXMLStreamReader(in); |
21 | int event = r.getEventType(); |
24 | case XMLStreamConstants.START_DOCUMENT: |
25 | System.out.println( "Start Document." ); |
27 | case XMLStreamConstants.START_ELEMENT: |
28 | System.out.println( "Start Element: " + r.getName()); |
29 | for ( int i = 0 , n = r.getAttributeCount(); i < n; ++i) |
30 | System.out.println( "Attribute: " |
31 | + r.getAttributeName(i) + "=" |
32 | + r.getAttributeValue(i)); |
35 | case XMLStreamConstants.CHARACTERS: |
39 | System.out.println( "Text: " + r.getText()); |
41 | case XMLStreamConstants.END_ELEMENT: |
42 | System.out.println( "End Element:" + r.getName()); |
44 | case XMLStreamConstants.END_DOCUMENT: |
45 | System.out.println( "End Document." ); |
60 | public static void main(String args[]) throws Exception { |
61 | TryCursorMode demo = new TryCursorMode(); |
执行这段程序,我们可以得到结果如下:
02 | Start Element: greetings |
03 | Start Element: greeting |
07 | Start Element: greeting |
11 | Start Element: greeting |
可以看到,StAX的API的设计思路与SAX是非常不同的,通过StAX,处理XML的逻辑被转移到了我们自己的主逻辑代码中。
通过以上三段代码,我们可以看到三种XML的处理方式的区别。有关这三种方式,还有非常详细深入的话题可以展开,如果有兴趣进一步学习,可以查看参考资料中的相关内容。
参考资料:
- Using DOM to Traverse XML – http://onjava.com/pub/a/onjava/2001/02/08/dom.html
- SAX Tutorial 1 – http://developerlife.com/tutorials/?p=29
- Using the SAX Parser – http://www.javacommerce.com/displaypage.jsp?name=saxparser1.sql&id=18232
- StAX’ing up XML, Part 1: An introduction to Streaming API for XML (StAX) –http://www.ibm.com/developerworks/xml/library/x-stax1.html
- Tip: Parsing XML documents partially with StAX –http://www.ibm.com/developerworks/xml/library/x-tipstx2/
- An Introduction to StAX – http://www.xml.com/pub/a/2003/09/17/stax.html
注解:
liweinan 2010-03-26 07:00PM