XML学习笔记-第一章 XML知识初始化

本文介绍了XML(可扩展标记语言)的基础知识,包括其历史背景、与HTML的区别、文档格式规范等,并深入探讨了XML的关键技术如DTD、XML Schema以及特殊字符处理等。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

XML stands for Extensible Markup Language.

XML, like HTML, is based on the granddaddy of all markup
languages, Standard Generalized Markup Language (SGML).

SGML was created in 1974 as part of an IBM
document-sharing project, and officially became an International Organization for Standardization (ISO) standard in 1986, long before the Internet or anything like it was operational.

In 1998 the World Wide Web Consortium (W3C) met this need by combining the
basic features that separate data from format in SGML with extension of the HTML
tag formats that were adapted for the Web and came up with the first Extensible
Markup Language (XML) Recommendation. The three pillars of XML are
Extensibility, Structure, and Validity.

The original XML data validation
standard is called Data Type Definition (DTD), and the more recent evolution of
XML data validation is the XML Schema standard.

XML is not data integration. It’s simply the
glue that holds data integration solutions together with a multi-platform “lowest
common denominator” for data transportation.

XML is not HTML. While HTML is designed to describe display characteristics
of data on a Web page to browsers, XML is designed to represent data structures.
XML data can be transformed into HTML using Extensible Style Sheet
Transformations (XSLT).

XML documents that meet W3C XML document formatting recommendations
are described as being well-formed XML documents.

Element names can containletters, numbers, hyphens, underscores, periods, and colons when namespacesare used (more on namespaces later). Element names cannot contain spaces;

Element names can start with a letter,underscore, or colon, but cannot start with other non-alphabetic characters or a number, or the letters xml.The basic rules and guidelines for elements apply to attributes as well;

An <?xml?> element  is called an XML document declaration. An XML document declaration is an optional
element that is useful to determine the version of XML and the encoding type of the source data. It is not a required element for an XML document to be well formed in the W3C XML 1.0 specification.

UTF stands for Universal Character Set Transformation Format, and the number 8 or
16 refers to the number of bits that the character is stored in. in fact, an XML document that does not specify an encoding type must adhere to either UTF-8 or UTF-16 to be considered a well-formed XML 1.0 document.

Aside from UTF declarations for XML document encoding, any ISO registered
charset name that is registered by the Internet Assigned Numbers Authority (IANA)
is an acceptable substitute.

Root element must be first in the list and unique in the document. Quotes must be used on all attribute names. Comments should always follow the SGML comment tag format.

Namespaces are a method for separating and identifying duplicate XML element
names in an XML document. A reserved xmlns: prefix is used when declaring a namespace
name and value.The value of the attribute provides the unique identifier for the namespace. Once the namespace is declared, the namespace name can be used as a prefix in element names

Namespace declarations are recommended if your XML documents have any current or
future potential of being shared with other XML documents that may share the
same element names.

A well-formed XML document that meets all of the requirements of one or more specifications
is called a valid XML Document.

DTDs are in fact non-well-formed XML documents.DTD, the element and attribute declarations do not have to be in the same order as the element and attributes that they
represent.

W3C Schemas follows the rules of well-formed XML documents. the element and attribute declarations in a W3C Schema do not have to be in the same order as the element and attributes that they represent
in an XML document.

Special characters in a well-formed XML document can be referenced via a declared entity, Unicode, or hex character reference. Entity references must start with an ampersand (&), Unicode character references start with an ampersand and a pound sign (&#), and hexadecimal character references start with an ampersand,
pound sign, and an x (&#x). All entity, Unicode, and hexadecimal references end with a semicolon (;).

The addition of a DTD is necessary for the entity references in the entityreferences element. The values for the entity references must be defined outside of the XML document. Entity references can also be used as variables and combined with other entity references in a DTD.

Reserved Character Entities and References
Entity Reference Special Character
&amp;                   ampersand (&)
&apos;                  apostrophe or single quote (‘)
&gt;                        greater-than (>)
&lt;                          less-than (<)
&quot;                   double quote (“)

New character sets accommodation for evolving Unicode specifications form the base of new features for XML 1.1.

Some new Unicode characters that XML 1.1 processors recognize as part of well-formed element, attribute, and namespacenames are not accepted by XML 1.0 document syntax rules. These characters could already be used in XML 1.0 text and attribute values.

XML 1.1 instead defines which characters can specifically not be included in well-formed XML documents and considers any undefined characters as part of well-formed XML. This makes it easier to accommodate developing Unicode specifications.

Another feature of XML 1.1 is the capability to handle line-end characters generated
in IBM mainframe file formats, which has been a long-standing issue between XML
documents generated and shared across ASCII and EBCDIC-based platforms. XML
1.1 parsers are required to recognize and accept EBCDIC line-end characters (#x85)
and the Unicode line separator (#x2028). These values should be converted to one
of the XML 1.0 ASCII line-end characters-—linefeed (decimal 10, #xA), or carriage
return (decimal 13, #xD).

Namespaces for XML 1.1The essential difference between the XML Namespaces 1.0
and 1.1 recommendations is the ability to “undeclare” a previously defined namespace
declaration and its associated prefix.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值