用webwork验证时老发生错误,提示Content is not allowed in prolog,由于是解析xml文件出错,找到相应的文件一看,发现开头有几行汉字说明忘注释掉了,我把它们注释掉,问题得到解决,由于以前没遇到这种问题,于是到网上找到如下资料: DOM4j 读取xml文件可能会抱错:Content is not allowed in prolog异常的原因 该xml是UTF-8编码的,如果该文件通过Ultraedit编辑后,会在无BOM头的UTF-8文件中加入BOM,但是DOM4j不认这个BOM(dom4j1.3),解决的办法可以通过升级dom4j到1.6解决www.dom4j.org 什么是BOM?http://www.unicode.org/faq/utf_bom.html#22
Unicode规范中有一个BOM的概念。BOM——Byte Order Mark,就是字节序标记。在这里找到一段关于BOM的说明:
看到一篇关于在eclipse中使用ant构建java项目的文章,就按照文章中的说明新建了一个项目,并新建了classes,dist,doc,lib几个文件夹和build.xml文件。其中build.xml文件的内容也是直接复制的,由于是从网页上直接复制的,里面“<”“>”都是中文格式的,我就把这些给替换了一下。 但当选中项目,然后选择“Project”,“Properties”,“Builders”,“New…”,选择“Ant Build”:载入“Bulidfile”时提示“Content is not allowed in prolog”错误。 我就去Google上转了一圈,还有点收获,看了一篇小文章,觉得可能是build.xml文件格式的错误。
In SGML and XML, a document is composed of two sequential parts, the prolog and the instance. You can see this in an HTML example: 1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN " 2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd " > 3 <html xmlns= "http://www.w3.org/1999/xhtml "> 4 <head > 5 <title >The Symbol Grounding Problem </title > 6 </head > 7 <body > 8 </body > 9 </html > In this example, the prolog is lines 1-2, the instance begins on line 3. The prolog includes the DOCTYPE declaration, the external subset (called the DTD), and the internal subset (which you seldom see but it 's legal). The document instance includes the document element (in this case <html > and all of its descendent content). You generally don 't want to see the prolog, and you generally don 't want to store it. The DOCTYPE declaration provides references to DTD, which is instantiated as part of the process of validating the document. You may want to store the reference(s), but you wouldn 't want to store the DTD each time you store the document, as that would be a real waste (the DTD is often bigger than the document). It sounds like your well-formed and valid document isn 't being considered as such by the XML processor. The error message indicates that there is content (i.e., either elements or character data) in the part of the document considered as the prolog. You may be missing the last " > " on line 2 above, as that would normally be the beginning of the internal subset. If it found " <html " (or something similar), you might get that error.
在用UltraEdit 编写hibernate的映射文件时,发现UltraEdit会自动向UTF8编码的文件的最开头,加入一个特殊字符。这在UltraEdit中是看不到的,在别的编辑器中可以看到。当使用dom4j解析文件时,出现content is not allowed in prolog错误。用别的编辑器去掉这个字符,错误就可以修正。
应该换一个 xml 解析器。 utf-8 文件头是 unicode 标准的,在 xml 标准中也提到过。有一部分 java 写的 xml 解析器比较烂,不认识这种文件头。但是比较好的解析器都认识。比如 apache 的解析器。