Fix Your Site With the Right DOCTYPE!

本文解释了DOCTYPE在网页标准中的重要性,并提供了正确的DOCTYPE示例,帮助网站在现代浏览器中正常显示。
 
April 12, 2002

Fix Your Site With the Right DOCTYPE!

You’ve done all the right stuff, but your site doesn’t look or work as it should in the latest browsers.

You’ve written valid XHTML and CSS. You’ve used the W3C standard Document Object Model (DOM) to manipulate dynamic page elements. Yet, in browsers designed to support these very standards, your site is failing. A faulty DOCTYPE is likely to blame.

This little article will provide you with DOCTYPEs that work, and explain the practical, real–world effect of these seemingly abstract tags.

WHY A DOCTYPE?

Per HTML and XHTML standards, a DOCTYPE (short for “document type declaration”) informs the validator which version of (X)HTML you’re using, and must appear at the very top of every web page. DOCTYPEs are a key component of compliant web pages: your markup and CSS won’t validate without them.

As mentioned in previous ALA articles (and in other interesting places), DOCTYPES are also essential to the proper rendering and functioning of web documents in compliant browsers like Mozilla, IE5/Mac, and IE6/Win.

A recent DOCTYPE that includes a full URI (a complete web address) tells these browsers to render your page in standards–compliant mode, treating your (X)HTML, CSS, and DOM as you expect them to be treated.

Using an incomplete or outdated DOCTYPE—or no DOCTYPE at all—throws these same browsers into “Quirks” mode, where the browser assumes you’ve written old-fashioned, invalid markup and code per the depressing industry norms of the late 1990s.

In this setting, the browser will attempt to parse your page in backward–compatible fashion, rendering your CSS as it might have looked in IE4, and reverting to a proprietary, browser–specific DOM. (IE reverts to the IE DOM; Mozilla and Netscape 6 revert to who knows what.)

Clearly, this is not what you want. But it is often what you’ll get, due to the preponderance of incorrect or incomplete DOCTYPE information this article hopes to correct.

(Note: The Opera browser does not play by these rules; it always attempts to render pages in standards–compliant mode. Go, Opera! On the other hand, Opera does not yet offer solid support for the W3C DOM. But they’re working on it.) {Ed: Since this article was first published, Opera has delivered the DOM-compliant Opera 7 browser.}

WHERE HAVE ALL THE DOCTYPES GONE?

Since DOCTYPES are vital to the proper functioning of web standards in browsers, and since W3C is a leading creator of web standards, you might expect W3C’s website to provide a listing of proper DOCTYPEs, and you might also expect to be able to find this information quickly and easily in a single location. But as of this writing, you can’t. {Ed. Prompted in part by this article, the W3C now lists standard DOCTYPEs on its site. You will find the listing a few screens into the W3C tutorial, “My Web site is standard. And yours?”}

W3.org is not A List Apart, WebReference, or Webmonkey. It’s not intended to help web designers, developers, and content folks get up to speed on the latest technological recommendations and practices. That’s not its job.

W3C does publish a series of tutorials, though most web developers are unaware of it. Mainly, though, W3C’s site houses a collection of proposals, drafts, and Recommendations, written by geeks for geeks. And when I say geeks, I don’t mean ordinary web professionals like you and me. I mean geeks who make the rest of us look like Grandma on the first day She’s Got Mail.™

You can search for DOCTYPEs all day at w3.org without finding one page that lists them all. And when you do hunt down a DOCTYPE (generally in relation to a particular Recommendation or Working Draft), it’s often one that won’t work on your site.

Scattered throughout W3C’s site are DOCTYPEs with missing URIs, and DOCTYPEs with relative URIs that point to documents on W3C’s own site. Once removed from W3C’s site and used on your web pages, these URIs point to non–existent documents, thus fouling up your best efforts and the browser’s.

For instance, many sites sport this DOCTYPE, copied and pasted directly from w3.org:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"DTD/xhtml1-strict.dtd">

If you look at the last part of the DOCTYPE (“DTD/xhtml1-strict.dtd”), you’ll see that it is a relative link to a document on W3C’s site. Since that document is on W3C’s site but not yours, the URI is useless to the browser.

The DOCTYPE you’d actually want to use is:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Notice that the latter DOCTYPE includes a complete URI at the end of the tag. Since the tag provides a valid location on the web, the browser knows where to find it, and will render your document in standards–compliant mode.

DOCTYPES THAT WORK

So what DOCTYPEs should we use? Glad you asked. The following complete DOCTYPEs are the ones we need:

HTML 4.01 Strict, Transitional, Frameset

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
"http://www.w3.org/TR/html4/frameset.dtd">

XHTML 1.0 Strict, Transitional, Frameset

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

XHTML 1.1 DTD

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" 
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

NEXT STEPS

How can you help improve support for standards on the web? Besides bookmarking this page (and copying and pasting these DOCTYPEs for your own use), if your web editor inserts DOCTYPEs, you might want to check them out and compare them to the list above.

Many well–intentioned software makers have cut and pasted incomplete DOCTYPEs from W3C into their software. Result: when you use these programs’ built–in functionality to insert DOCTYPEs in your pages, the browsers go into Quirks mode, undoing all your hard work.

It’s worth contacting the folks who make your favorite authoring package, showing them the proper DOCTYPEs, and politely requesting them to address this issue in an incremental upgrade. (In some cases, you may also be able to modify your editor yourself.)

COMING SOON

We have every reason to believe that W3C’s site will soon sport a handy listing of accurate, usable DOCTYPES and other essential information in an easy–to–find location. In fact, Karl Dubost, Conformance Manager of W3C’s Quality Assurance team, contributed to the information in this little article.

Likewise, when The Web Standards Project relaunches (real soon), it will also provide this information. {Ed. note: The Web Standards Project relaunched in late 2002.}

But sites are being designed and built every day, and you need this information now—so there it is.


Happy authoring and rendering!

<think>我们正在讨论XML中的DOCTYPE和ENTITY,以及它们与禁用DTD和外部实体的关系。根据之前的背景,我们知道通过设置两个特性可以禁用DTD和外部实体,现在需要解释这两个概念及其关系。 首先,回顾引用内容: 引用[1]:介绍了DTD(文档类型定义)的作用,它定义了XML文档的结构,包括元素、属性和关系。DOCTYPE声明用于关联DTD,它位于XML声明之后,文档元素之前。 引用[2]:解释了为什么需要DTD,即为了验证XML文档的结构是否符合预期。 引用[3][4]:提到了XXE漏洞,其中DOCTYPE和ENTITY是攻击的关键部分,特别是外部实体引用可能导致文件读取等攻击。 现在,我们需要解释: 1. !DOCTYPE 是什么? 2. !ENTITY 是什么? 3. 它们与DTD的关系。 4. 为什么禁用DTD(通过disallow-doctype-decl)和禁用外部实体(通过external-general-entities)能够防御XXE攻击。 步骤: 1. 解释DOCTYPE:它是文档类型声明,用于定义DTD,即定义XML文档的结构规则(元素、属性、实体等)。DOCTYPE声明可以包含内部DTD或引用外部DTD。 2. 解释ENTITY:实体是XML中定义的可重用数据单元。实体可以是内部的(在文档内部定义)或外部的(引用外部资源)。实体分为通用实体和参数实体。 3. 关系:DOCTYPE声明中包含了实体的定义(或引用)。因此,DTD是定义实体的地方。如果没有DTD,就不能定义实体(特别是外部实体)。 4. 禁用机制: - 禁用DTD(disallow-doctype-decl=true)意味着解析器不允许文档中包含DOCTYPE声明,因此整个DTD(包括实体定义)被禁用。 - 禁用外部通用实体(external-general-entities=false)允许DOCTYPE声明存在,但禁止解析外部实体(即不会从外部文件或URL加载内容)。 因此,双重设置确保了即使在有DOCTYPE声明的情况下,外部实体也不会被解析,而如果完全禁用DTD,则根本不允许DOCTYPE声明,从而更彻底地防御XXE。 下面用数学公式描述解析过程的变化: 设原始XML解析过程为函数 $P(X)$,其中 $X$ 为XML文档。在启用DTD和外部实体的情况下,解析时会展开所有实体(包括外部实体)。 当禁用DTD时,解析器将拒绝任何包含DOCTYPE声明的文档,即: $$P_{\text{safe1}}(X) = \begin{cases} \text{error} & \text{if } X \text{ contains } \texttt{<!DOCTYPE} \\ P(X) & \text{otherwise} \end{cases}$$ 当禁用外部通用实体时,解析器会正常解析DTD,但遇到外部实体时不会展开,而是当作普通文本或忽略: $$P_{\text{safe2}}(X) = P(X) \text{ with all external entity references unexpanded}$$ 双重设置则同时具备上述两种保护。 按照要求,我们需要生成相关问题。 最后,注意回答中引用的段落末尾添加引用标识。 </think>### !DOCTYPE!ENTITY 在 XML 中的含义及安全关系 #### 1. **!DOCTYPE 声明的作用** !DOCTYPE 是 XML 文档的**文档类型声明**,用于关联 DTD(文档类型定义)。它定义了: - 文档根元素(如 `<!DOCTYPE Catelog...>`) - 元素的结构规则、属性约束 - **实体声明区域**(允许定义 `!ENTITY`) ```xml <?xml version="1.0"?> <!DOCTYPE Catelog [ <!ENTITY company "ABC Corp"> <!-- 实体定义 --> ]> <root>...</root> ``` DTD 本质上是一套**语法约束规则**,确保 XML 文档符合预期结构[^1][^2]。 --- #### 2. **!ENTITY 声明的作用** !ENTITY 用于定义**可复用的数据单元**,分为两类: - **内部实体**:值直接嵌入 XML ```xml <!ENTITY greeting "Hello"> <!-- 使用 &greeting; 引用 --> ``` - **外部实体**:引用外部资源(高危!) ```xml <!ENTITY file SYSTEM "file:///etc/passwd"> <!-- 使用 &file; 读取文件 --> ``` --- #### 3. **与禁用 DTD/外部实体的关系** | 安全设置 | 数学表达 | 作用机制 | |----------|----------|----------| | **`disallow-doctype-decl=true`**<br>(禁用 DTD) | 解析函数 $P(X)$ 修改为:<br>$P_{\text{safe}}(X) = P(X) \mid \text{DOCTYPE} \notin X$ | 阻止解析器处理 `<!DOCTYPE>` 声明,使实体定义(包括攻击载荷)失效[^3] | | **`external-general-entities=false`**<br>(禁用外部实体) | 实体解析函数 $E(e)$ 修改为:<br>$E_{\text{safe}}(e) = \begin{cases} e & \text{内部实体} \\ \text{null} & \text{外部实体} \end{cases}$ | 允许内部实体正常解析,但阻止加载外部资源(如文件/URL)[^4] | **双重防护的协同效应**: - 禁用 DTD:彻底移除实体定义的可能性(攻击入口消失) - 禁用外部实体:即使 DTD 存在,也阻断外部资源访问(攻击路径断裂) 二者结合可完全防御 XXE(XML 外部实体注入)攻击[^3][^4]。 --- #### 4. **XXE 攻击中的关键角色** 攻击者通常组合使用二者: ```xml <!DOCTYPE attack [ <!-- !DOCTYPE 声明 --> <!ENTITY xxe SYSTEM "file:///secret"> <!-- !ENTITY 定义 --> ]> <data>&xxe;</data> <!-- 实体引用 --> ``` - **!DOCTYPE** 创建攻击载体(DTD 环境) - **!ENTITY** 注入恶意操作(读取文件/SSRF) 禁用上述特性后,此类攻击载荷将被解析器直接拒绝或忽略。 --- ### 总结 | 概念 | 功能 | 安全风险 | 禁用效果 | |------------|-------------------------------|--------------------------|------------------------------| | **!DOCTYPE** | 关联 DTD,定义文档结构规则 | 为实体攻击提供执行环境 | 彻底移除实体定义能力 | | **!ENTITY** | 定义内部/外部数据引用 | 外部实体导致文件泄露/SSRF | 阻断外部资源加载 | > 通过双重禁用,解析器的语法处理范围被限制为: > $$G_{\text{secure}} = (V, \Sigma, R - \{ \text{DOCTYPE规则}, \text{外部实体规则} \}, S)$$ > 其中 $R$ 为原始解析规则集,移除了高危规则[^3]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值