java APIs for xml --------dom(1)

本文详细介绍了 Java API for XML Processing (JAXP) 中文档对象模型 (DOM) 的节点接口,包括其类型、属性及使用方法,并通过实例展示了如何实现 XML 解析流程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

# Java API for XML Processing (JAXP)
# Study Note of JAVA Tutorial

Document Object Model APIs

  node 接口是文档对象模型的主要类型,它代表了文档树中的单个节点。Node 有很多子类(接口),具体属性如下表:

node(subclass)Named ConstantnodeNamenodeValueattributes
AttrNode.ATTRIBUTE_NODEsame as Attr.namesame as Attr.valuenull
CDATASectionNode.CDATA_SECTION_NODE"#cdata-section"same as CharacterData.data, the content of the CDATA Sectionnull
CommentNode.COMMENT_NODE"#comment"same as CharacterData.data, the content of the commentnull
DocumentNode.DOCUMENT_NODE"#document"nullnull
DocumentFragmentNode.DOCUMENT_FRAGMENT_NODE"#document-fragment"nullnull
DocumentTypeNode.DOCUMENT_TYPE_NODEsame as DocumentType.namenullnull
ElementNode.ELEMENT_NODEsame as Element.tagNamenullNamedNodeMap
EntityNode.ENTITY_NODEentity namenullnull
EntityReferenceNode.ENTITY_REFERENCE_NODEname of entity referencednullnull
NotationNode.NOTATION_NODEnotation namenullnull
ProcessingInstructionNode.PROCESSING_INSTRUCTION_NODEsame as ProcessingInstruction.targetsame as ProcessingInstruction.datanull
TextNode.TEXT_NODE"#text"same as CharacterData.data, the content of the text nodenull

 

节点类型描述子元素
Document表示整个文档(DOM 树的根节点)
  • Element (max. one)
  • ProcessingInstruction
  • Comment
  • DocumentType
DocumentFragment表示轻量级的 Document 对象,其中容纳了一部分文档。
  • ProcessingInstruction
  • Comment
  • Text
  • CDATASection
  • EntityReference
DocumentType向为文档定义的实体提供接口。None
ProcessingInstruction表示处理指令。None
EntityReference表示实体引用元素。
  • ProcessingInstruction
  • Comment
  • Text
  • CDATASection
  • EntityReference
Element表示 element(元素)元素
  • Text
  • Comment
  • ProcessingInstruction
  • CDATASection
  • EntityReference
Attr表示属性。
  • Text
  • EntityReference
Text表示元素或属性中的文本内容。None
CDATASection表示文档中的 CDATA 区段(文本不会被解析器解析)None
Comment表示注释。None
Entity表示实体。
  • ProcessingInstruction
  • Comment
  • Text
  • CDATASection
  • EntityReference
Notation表示在 DTD 中声明的符号。None

把xml dom parsing 分为3个步骤:

1. Instantiate the Factory and Set Properties

2. Get a Parser and set Error Handler

3. Parse the File and Get DOM Tree

domparser 类实现了上面三个步骤,代码如下:

package dom;

import java.io.File;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

public class domparser {
    static final String outputEncoding = "UTF-8";
    
    /* Constants used for XML validation */
    static final String JAXP_SCHEMA_LANGUAGE = "http://java.sun.com/xml/jaxp/properties/schemaLanguage";
    static final String W3C_XML_SCHEMA = "http://www.w3.org/2001/XMLSchema";
    static final String JAXP_SCHEMA_SOURCE = "http://java.sun.com/xml/jaxp/properties/schemaSource";
        
    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {

        /*
         *  dtdValidate          : DTD validation, if true, do validation 
         *  xsdValidate          : W3C XML Schema validation, if true, do validation 
         *  schemaSource         : schema source XSD file
         *  ignoreWhitespace     : if true, ignore white space
         *  ignoreComments       : if true, ignore comments
         *  putCDATAIntoText     : if true, put CDATA into Text nodes
         *  createEntityRefs     : create EntityReference nodes
         */
        String filename = "/sandbox/javatest/data.xml";
        boolean dtdValidate = false;
        boolean xsdValidate = false;
        String schemaSource = null;
        boolean ignoreWhitespace = true;
        boolean ignoreComments = false;
        boolean putCDATAIntoText = false;
        boolean createEntityRefs = false;

        /**  Step 1: create a DocumentBuilderFactory and configure it */
        DocumentBuilderFactory dbf =DocumentBuilderFactory.newInstance();
        /*
         *  Set namespaceAware to true to get a DOM Level 2 tree with nodes containing NameSapce information.
         */
        dbf.setNamespaceAware(true);
        
        // Set the validation mode: no validation, DTD validation, or XSD validation
        dbf.setValidating(dtdValidate || xsdValidate);
        if (xsdValidate) {
            try {
                dbf.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
            } catch (IllegalArgumentException x) {
                // This can happen if the parser does not support JAXP 1.2
                System.err.println(
                    "Error: JAXP DocumentBuilderFactory attribute not recognized: "
                    + JAXP_SCHEMA_LANGUAGE);
                System.err.println(
                    "Check to see if parser conforms to JAXP 1.2 spec.");
                System.exit(1);
            }
        }

        // Set the schema source.
        if (schemaSource != null) {
            dbf.setAttribute(JAXP_SCHEMA_SOURCE, new File(schemaSource));
        }
        
        // Optional: set various configuration options
        dbf.setIgnoringComments(ignoreComments);
        dbf.setIgnoringElementContentWhitespace(ignoreWhitespace);
        dbf.setCoalescing(putCDATAIntoText);
        // The opposite of creating entity reference nodes is expanding them inline
        dbf.setExpandEntityReferences(!createEntityRefs);

        /** Step 2: create a DocumentBuilder that satisfies the constraints specified by the DocumentBuilderFactory*/
        DocumentBuilder db = dbf.newDocumentBuilder();
        // Set an ErrorHandler before parsing
        OutputStreamWriter errorWriter = new OutputStreamWriter(System.err, outputEncoding);
        db.setErrorHandler( new MyErrorHandler(new PrintWriter(errorWriter, true)));

        /** Step 3: parse the input file and handle dom tree*/
        Document doc = db.parse(new File(filename));

        // handling the DOM tree
        OutputStreamWriter outWriter = new OutputStreamWriter(System.out, outputEncoding);
        new domecho(new PrintWriter(outWriter, true)).echo(doc);
    }
    private static class MyErrorHandler implements ErrorHandler {
        /** Error handler output goes here */
        private PrintWriter out;

        MyErrorHandler(PrintWriter out) {
            this.out = out;
        }

        /**
         * Returns a string describing parse exception details
         */
        private String getParseExceptionInfo(SAXParseException spe) {
            String systemId = spe.getSystemId();
            if (systemId == null) {
                systemId = "null";
            }
            String info = "URI=" + systemId +
                " Line=" + spe.getLineNumber() +
                ": " + spe.getMessage();
            return info;
        }

        // The following methods are standard SAX ErrorHandler methods.
        // See SAX documentation for more info.

        public void warning(SAXParseException spe) throws SAXException {
            out.println("Warning: " + getParseExceptionInfo(spe));
        }
        
        public void error(SAXParseException spe) throws SAXException {
            String message = "Error: " + getParseExceptionInfo(spe);
            throw new SAXException(message);
        }

        public void fatalError(SAXParseException spe) throws SAXException {
            String message = "Fatal Error: " + getParseExceptionInfo(spe);
            throw new SAXException(message);
        }
    }

}
View Code

然后是DOM Tree 的处理部分,这里只是把node 信息输出,实现该功能的是domecho类,代码如下:

package dom;

import java.io.PrintWriter;

import org.w3c.dom.DocumentType;
import org.w3c.dom.Entity;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;

public class domecho {
    private PrintWriter out;

    /** Indent level */
    private int indent = 0;
    
    /** Indentation will be in multiples of basicIndent  */
    private final String basicIndent = "  ";
    
    domecho(PrintWriter out) {
        this.out = out;
    }

    /**
     * Echo common attributes of a DOM2 Node and terminate output with an
     * EOL character.
     */
    private void printlnCommon(Node n) {
        out.print(" nodeName=\"" + n.getNodeName() + "\"");

        String val = n.getNamespaceURI();
        if (val != null) {
            out.print(" uri=\"" + val + "\"");
        }

        val = n.getPrefix();
        if (val != null) {
            out.print(" pre=\"" + val + "\"");
        }

        val = n.getLocalName();
        if (val != null) {
            out.print(" local=\"" + val + "\"");
        }

        val = n.getNodeValue();
        if (val != null) {
            out.print(" nodeValue=");
            if (val.trim().equals("")) {
                // Whitespace
                out.print("[WS]");
            } else {
                out.print("\"" + n.getNodeValue() + "\"");
            }
        }
        out.println();
    }

    /**
     * Indent to the current level in multiples of basicIndent
     */
    private void outputIndentation() {
        for (int i = 0; i < indent; i++) {
            out.print(basicIndent);
        }
    }

    /**
     * Recursive routine to print out DOM tree nodes
     */
    public void echo(Node n) {
        // Indent to the current level before printing anything
        outputIndentation();

        int type = n.getNodeType();
        switch (type) {
        case Node.ATTRIBUTE_NODE:
            out.print("ATTR:");
            printlnCommon(n);
            break;
        case Node.CDATA_SECTION_NODE:
            out.print("CDATA:");
            printlnCommon(n);
            break;
        case Node.COMMENT_NODE:
            out.print("COMM:");
            printlnCommon(n);
            break;
        case Node.DOCUMENT_FRAGMENT_NODE:
            out.print("DOC_FRAG:");
            printlnCommon(n);
            break;
        case Node.DOCUMENT_NODE:
            out.print("DOC:");
            printlnCommon(n);
            break;
        case Node.DOCUMENT_TYPE_NODE:
            out.print("DOC_TYPE:");
            printlnCommon(n);

            // Print entities if any
            NamedNodeMap nodeMap = ((DocumentType)n).getEntities();
            indent += 2;
            for (int i = 0; i < nodeMap.getLength(); i++) {
                Entity entity = (Entity)nodeMap.item(i);
                echo(entity);
            }
            indent -= 2;
            break;
        case Node.ELEMENT_NODE:
            out.print("ELEM:");
            printlnCommon(n);

            /* 
             * Print attributes if any.  
             * Note: element attributes are not children of ELEMENT_NODEs .
             * But are properties of their associated ELEMENT_NODE.  
             * For this reason, they are printed with 2x the indent level to indicate this.
             */
            NamedNodeMap atts = n.getAttributes();
            indent += 2;
            for (int i = 0; i < atts.getLength(); i++) {
                Node att = atts.item(i);
                echo(att);
            }
            indent -= 2;
            break;
        case Node.ENTITY_NODE:
            out.print("ENT:");
            printlnCommon(n);
            break;
        case Node.ENTITY_REFERENCE_NODE:
            out.print("ENT_REF:");
            printlnCommon(n);
            break;
        case Node.NOTATION_NODE:
            out.print("NOTATION:");
            printlnCommon(n);
            break;
        case Node.PROCESSING_INSTRUCTION_NODE:
            out.print("PROC_INST:");
            printlnCommon(n);
            break;
        case Node.TEXT_NODE:
            out.print("TEXT:");
            printlnCommon(n);
            break;
        default:
            out.print("UNSUPPORTED NODE: " + type);
            printlnCommon(n);
            break;
        }

        // Print children if any
        indent++;
        for (Node child = n.getFirstChild(); child != null;
             child = child.getNextSibling()) {
            echo(child);
        }
        indent--;
    }
}
View Code

 

转载于:https://www.cnblogs.com/ct-blog/p/5500444.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值