XML解析与DOM操作-优快云博客

本文链接：https://blog.youkuaiyun.com/u011984172/article/details/43734415

上一篇中我们讨论了解析器如何将XML文件转化为Document（总体大概知道是如何解析，还需后期对底层代码好好了解），下面我们看看转化为Document之后，是如何取的Document里面的一些元素节点的呢？

再看代码之后我要了解一些基础的知识：

java.util.Vector：我个人的理解它的本质是一个大小可变化的数组，在这个基础上又封装了一些方法来存取数据

上一篇我们知道：Document document = builder.parse(inputStream);返回的对象是DeferredDocumentImpl

那我们继续逐行分析我们的例子代码：Element element = document.getDocumentElement();

它是调用父类com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl的方法：

/**
     * Convenience method, allowing direct access to the child node
     * which is considered the root of the actual document content. For
     * HTML, where it is legal to have more than one Element at the top
     * level of the document, we pick the one with the tagName
     * "HTML". For XML there should be only one top-level
     *
     * (HTML not yet supported.)
     */
    public Element getDocumentElement() {
        if (needsSyncChildren()) {
            synchronizeChildren();
        }
        return docElement;
    }

最后返回的对象是com.sun.org.apache.xerces.internal.dom.ElementImpl：

我们在来看看这句：NodeList bookNodes = element.getElementsByTagName("book");

/**
     * Returns a NodeList of all descendent nodes (children,
     * grandchildren, and so on) which are Elements and which have the
     * specified tag name.
     * <p>
     * Note: NodeList is a "live" view of the DOM. Its contents will
     * change as the DOM changes, and alterations made to the NodeList
     * will be reflected in the DOM.
     *
     * @param tagname The type of element to gather. To obtain a list of
     * all elements no matter what their names, use the wild-card tag
     * name "*".
     *
     * @see DeepNodeListImpl
     */
    public NodeList getElementsByTagName(String tagname) {
    	return new DeepNodeListImpl(this,tagname);
    }

看看这句：bookNodes.getLength()：

/** Returns the length of the node list. */
    public int getLength() {
        // Preload all matching elements. (Stops when we run out of subtree!)
        item(java.lang.Integer.MAX_VALUE);
        return nodes.size();
    }

这里就用到了上面说的Vector知识点：

/** Returns the node at the specified index. */
    public Node item(int index) {
    	Node thisNode;

        // Tree changed. Do it all from scratch!
    	if(rootNode.changes() != changes) {
            nodes   = new Vector();     
            changes = rootNode.changes();
    	}
    
        // In the cache
    	if (index < nodes.size())      
    	    return (Node)nodes.elementAt(index);
    
        // Not yet seen
    	else {
    
            // Pick up where we left off (Which may be the beginning)
    		if (nodes.size() == 0)     
    		    thisNode = rootNode;
    		else
    		    thisNode=(NodeImpl)(nodes.lastElement());
    
    		// Add nodes up to the one we're looking for
    		while(thisNode != null && index >= nodes.size()) {
    			thisNode=nextMatchingElementAfter(thisNode);
    			if (thisNode != null)
    			    nodes.addElement(thisNode);
    		    }

            // Either what we want, or null (not avail.)
		    return thisNode;           
	    }

    } // item(int):Node

下面这个方法Node current一开始传入进来就是DeferredElementImpl类

    /** 
     * Iterative tree-walker. When you have a Parent link, there's often no
     * need to resort to recursion. NOTE THAT only Element nodes are matched
     * since we're specifically supporting getElementsByTagName().
     */
    protected Node nextMatchingElementAfter(Node current) {

	    Node next;
	    while (current != null) {
		    // Look down to first child.
		    if (current.hasChildNodes()) {
			    current = (current.getFirstChild());
		    }

		    // Look right to sibling (but not from root!)
		    else if (current != rootNode && null != (next = current.getNextSibling())) {
				current = next;
			}

			// Look up and right (but not past root!)
			else {
				next = null;
				for (; current != rootNode; // Stop when we return to starting point
					current = current.getParentNode()) {

					next = current.getNextSibling();
					if (next != null)
						break;
				}
				current = next;
			}

			// Have we found an Element with the right tagName?
			// ("*" matches anything.)
		    if (current != rootNode 
		        && current != null
		        && current.getNodeType() ==  Node.ELEMENT_NODE) {
			if (!enableNS) {
			    if (tagName.equals("*") ||
				((ElementImpl) current).getTagName().equals(tagName))
			    {
				return current;
			    }
			} else {
			    // DOM2: Namespace logic. 
			    if (tagName.equals("*")) {
				if (nsName != null && nsName.equals("*")) {
				    return current;
				} else {
				    ElementImpl el = (ElementImpl) current;
				    if ((nsName == null
					 && el.getNamespaceURI() == null)
					|| (nsName != null
					    && nsName.equals(el.getNamespaceURI())))
				    {
					return current;
				    }
				}
			    } else {
				ElementImpl el = (ElementImpl) current;
				if (el.getLocalName() != null
				    && el.getLocalName().equals(tagName)) {
				    if (nsName != null && nsName.equals("*")) {
					return current;
				    } else {
					if ((nsName == null
					     && el.getNamespaceURI() == null)
					    || (nsName != null &&
						nsName.equals(el.getNamespaceURI())))
					{
					    return current;
					}
				    }
				}
			    }
			}
		    }

		// Otherwise continue walking the tree
	    }

	    // Fell out of tree-walk; no more instances found
	    return null;

    } // nextMatchingElementAfter(int):Node

我们先来看看com.sun.org.apache.xerces.internal.dom.ParentNode类中的current.hasChildNodes()这句都做了些什么：

 /**
     * Test whether this node has any children. Convenience shorthand
     * for (Node.getFirstChild()!=null)
     */
    public boolean hasChildNodes() {
        if (needsSyncChildren()) {
            synchronizeChildren();
        }
        return firstChild != null;
    }

这里的synchronizeChildren()方法是实现类DeferredElementImpl实现如下：

protected final void synchronizeChildren() {
        DeferredDocumentImpl ownerDocument =
            (DeferredDocumentImpl) ownerDocument();
        ownerDocument.synchronizeChildren(this, fNodeIndex);
    } // synchronizeChildren()

仅仅针对这个例子而言：注意看getNodeObject(index)返回的是ChildNode对象其实是DeferredTextImpl，因为element下面还有text

这里就把text对象赋值给element： p.firstChild = firstNode;所以在后面才能取到

    /**
     * Synchronizes the node's children with the internal structure.
     * Fluffing the children at once solves a lot of work to keep
     * the two structures in sync. The problem gets worse when
     * editing the tree -- this makes it a lot easier.
     * This is not directly used in this class but this method is
     * here so that it can be shared by all deferred subclasses of ParentNode.
     */
    protected final void synchronizeChildren(ParentNode p, int nodeIndex) {

        // we don't want to generate any event for this so turn them off
        boolean orig = getMutationEvents();
        setMutationEvents(false);

        // no need to sync in the future
        p.needsSyncChildren(false);

        // create children and link them as siblings
        ChildNode firstNode = null;
        ChildNode lastNode = null;
        for (int index = getLastChild(nodeIndex);
             index != -1;
             index = getPrevSibling(index)) {

            ChildNode node = (ChildNode) getNodeObject(index);
            if (lastNode == null) {
                lastNode = node;
            }
            else {
                firstNode.previousSibling = node;
            }
            node.ownerNode = p;
            node.isOwned(true);
            node.nextSibling = firstNode;
            firstNode = node;
        }
        if (lastNode != null) {
            p.firstChild = firstNode;
            firstNode.isFirstChild(true);
            p.lastChild(lastNode);
        }

        // set mutation events flag back to its original value
        setMutationEvents(orig);

    } // synchronizeChildren(ParentNode,int):void

我们可以看到在 type == 3，会新建DeferredTextImpl对象：

/** Instantiates the requested node object. */
    public DeferredNode getNodeObject(int nodeIndex) {

        // is there anything to do?
        if (nodeIndex == -1) {
            return null;
        }

        // get node type
        int chunk = nodeIndex >> CHUNK_SHIFT;
        int index = nodeIndex & CHUNK_MASK;
        int type = getChunkIndex(fNodeType, chunk, index);
        if (type != Node.TEXT_NODE && type != Node.CDATA_SECTION_NODE) {
            clearChunkIndex(fNodeType, chunk, index);
        }

        // create new node
        DeferredNode node = null;
        switch (type) {

            //
            // Standard DOM node types
            //

            case Node.ATTRIBUTE_NODE: {
                if (fNamespacesEnabled) {
                    node = new DeferredAttrNSImpl(this, nodeIndex);
                } else {
                    node = new DeferredAttrImpl(this, nodeIndex);
                }
                break;
            }

            case Node.CDATA_SECTION_NODE: {
                node = new DeferredCDATASectionImpl(this, nodeIndex);
                break;
            }

            case Node.COMMENT_NODE: {
                node = new DeferredCommentImpl(this, nodeIndex);
                break;
            }

            // NOTE: Document fragments can never be "fast".
            //
            //       The parser will never ask to create a document
            //       fragment during the parse. Document fragments
            //       are used by the application *after* the parse.
            //
            // case Node.DOCUMENT_FRAGMENT_NODE: { break; }
            case Node.DOCUMENT_NODE: {
                // this node is never "fast"
                node = this;
                break;
            }

            case Node.DOCUMENT_TYPE_NODE: {
                node = new DeferredDocumentTypeImpl(this, nodeIndex);
                // save the doctype node
                docType = (DocumentTypeImpl)node;
                break;
            }

            case Node.ELEMENT_NODE: {

                if (DEBUG_IDS) {
                    System.out.println("getNodeObject(ELEMENT_NODE): "+nodeIndex);
                }

                // create node
                if (fNamespacesEnabled) {
                    node = new DeferredElementNSImpl(this, nodeIndex);
                } else {
                    node = new DeferredElementImpl(this, nodeIndex);
                }

                // save the document element node
                if (docElement == null) {
                    docElement = (ElementImpl)node;
                }

                // check to see if this element needs to be
                // registered for its ID attributes
                if (fIdElement != null) {
                    int idIndex = binarySearch(fIdElement, 0,
                                               fIdCount-1, nodeIndex);
                    while (idIndex != -1) {

                        if (DEBUG_IDS) {
                            System.out.println("  id index: "+idIndex);
                            System.out.println("  fIdName["+idIndex+
                                               "]: "+fIdName[idIndex]);
                        }

                        // register ID
                        String name = fIdName[idIndex];
                        if (name != null) {
                            if (DEBUG_IDS) {
                                System.out.println("  name: "+name);
                                System.out.print("getNodeObject()#");
                            }
                            putIdentifier0(name, (Element)node);
                            fIdName[idIndex] = null;
                        }

                        // continue if there are more IDs for
                        // this element
                        if (idIndex + 1 < fIdCount &&
                            fIdElement[idIndex + 1] == nodeIndex) {
                            idIndex++;
                        }
                        else {
                            idIndex = -1;
                        }
                    }
                }
                break;
            }

            case Node.ENTITY_NODE: {
                node = new DeferredEntityImpl(this, nodeIndex);
                break;
            }

            case Node.ENTITY_REFERENCE_NODE: {
                node = new DeferredEntityReferenceImpl(this, nodeIndex);
                break;
            }

            case Node.NOTATION_NODE: {
                node = new DeferredNotationImpl(this, nodeIndex);
                break;
            }

            case Node.PROCESSING_INSTRUCTION_NODE: {
                node = new DeferredProcessingInstructionImpl(this, nodeIndex);
                break;
            }

            case Node.TEXT_NODE: {
                node = new DeferredTextImpl(this, nodeIndex);
                break;
            }

            //
            // non-standard DOM node types
            //

            case NodeImpl.ELEMENT_DEFINITION_NODE: {
                node = new DeferredElementDefinitionImpl(this, nodeIndex);
                break;
            }

            default: {
                throw new IllegalArgumentException("type: "+type);
            }

        } // switch node type

        // store and return
        if (node != null) {
            return node;
        }

        // error
        throw new IllegalArgumentException();

    } // createNodeObject(int):Node

我这里有一点搞不明白，当执行完这句话node = new DeferredElementImpl(this, nodeIndex);之后，node里面属性name的值就有了。

后来也看了很多次，发现每次传入nodeIndex的值都不一样，我在想原因是不是在这里呢？