Quoted-Printable 加码规则(RFC 1341):
1. 字符用 =XX 形式表示,其中 XX 是该字符的十六进制值,
必须为 0-9 或者 A-F (使用大写字符),除非有可替换说明,
否则,此原则是强制性的。
2. 其中,十进制值 33-60 & 62-126(注意: 即不包含 '= ' )
可以作为标准 ASCII 从而不进行转换。
3. 另外,十进制值 9-32 也可以作为制表和格式控制字符,
从而不进行转换。(注意,这个不是必须执行的,即也可以转换)
4. 由于在 RFC822 协议中规定主体 body 文本中各行均有最大字
符限制,因此,当主体文本中出现 CRLF 或者 LFCR 字符序列,
或者单独的 CR 以及 LF 字符的时候,必须转换成对应的
"=0D=0A ", "=0A=0D ", "=0D ", "=0A " 等编码来表示。
5. (关于软回车的问题) Quoted-Printable 编码要求编码后每行
最大字符数量不得超过 76 个字符。如果对大于该字符数量的行进
行编码,则必须使用软回车。所以,对于某个以编码行的最后加上
'= '符号,则表示最后这个 '= ' 是一个无意义的软回车。所以,如
果一个尚未编码的行的内容如下的话:
Now 's the time for all folk to come to the aid of their country.
那么在 Quoted-Printable 中可以表示为:
Now 's the time =
for all folk to come=
to the aid of their country.
他提供了一种对过长的行进行编码并恢复到用户原来的输入内容的
机制。虽然一行的末尾的 CRLF 不计入 76 个字符的限制之中,但
是所有的其他字符,包括 '= ' 符号都将被计算在内。
由于连字符号 '- ' 在 Quoted-Printable 编码中表示他自己,所以当
我们在对一个 multipart 实体的主体内容编码的时候,我们必须注
意:我们决不能让一个 boundary 标志符出现在编码的主体部分!
(一个比较好的办法是在 boundary 中包含一个 "=_ ",这样就决不会重复
了,具体情况清查阅 RFC 1341 中的 multipart message 的定义部分。)
注意:采用 Quoted-Printable 编码是邮件的传输过程中,对于易读性
和可靠性折衷的一种编码。对于使用 Quoted-Printable 编码的邮件主
体,绝大多数邮件网关(mail gateway)都能够可靠的工作,但是也可能
在极少的邮件网关上工作的并不十分好,最显著的莫过于涉及到那些
EBCDIC 的传输的时候。(理论上来说, EBCDIC 网关能够对 Quoted-Pintable
编码进行解码,然后使用 Base64 编码来重新对主体内容进行编码,但是
这些网关在实际中还没有出现呢。)
对于更高的要求,我们使用 Base64 编码。一种适度可信的传输通过
EBCDIC 网关的方法就是依照 [规则 1] 引用如下的 ASCII 码:
! "#$@[\]^`{|}~
更多信息请查看 RFC1341 的 [附录 B]。
由于被 Quoted-Printable 编码的数据通常被认为是行导向的(line-oriented),
对于使用 Quoted-Printable 编码的数据我们希望行与行之间换行符在传输中被
改写(译者注:由于不同的系统 unix, windows, mac得换行符不同),同样的,我
们希望一封普通文本文件内容的邮件(plain text mail)可以在不同的系统中转换
成不同换行符的互联网邮件(Internet mail)。如果这种转换可能导致原始数据大
量变化(a corruption of the data),那么比较明智的选择是应用 base64 编码,
来替换 Quoted-Printable 编码!
1. 字符用 =XX 形式表示,其中 XX 是该字符的十六进制值,
必须为 0-9 或者 A-F (使用大写字符),除非有可替换说明,
否则,此原则是强制性的。
2. 其中,十进制值 33-60 & 62-126(注意: 即不包含 '= ' )
可以作为标准 ASCII 从而不进行转换。
3. 另外,十进制值 9-32 也可以作为制表和格式控制字符,
从而不进行转换。(注意,这个不是必须执行的,即也可以转换)
4. 由于在 RFC822 协议中规定主体 body 文本中各行均有最大字
符限制,因此,当主体文本中出现 CRLF 或者 LFCR 字符序列,
或者单独的 CR 以及 LF 字符的时候,必须转换成对应的
"=0D=0A ", "=0A=0D ", "=0D ", "=0A " 等编码来表示。
5. (关于软回车的问题) Quoted-Printable 编码要求编码后每行
最大字符数量不得超过 76 个字符。如果对大于该字符数量的行进
行编码,则必须使用软回车。所以,对于某个以编码行的最后加上
'= '符号,则表示最后这个 '= ' 是一个无意义的软回车。所以,如
果一个尚未编码的行的内容如下的话:
Now 's the time for all folk to come to the aid of their country.
那么在 Quoted-Printable 中可以表示为:
Now 's the time =
for all folk to come=
to the aid of their country.
他提供了一种对过长的行进行编码并恢复到用户原来的输入内容的
机制。虽然一行的末尾的 CRLF 不计入 76 个字符的限制之中,但
是所有的其他字符,包括 '= ' 符号都将被计算在内。
由于连字符号 '- ' 在 Quoted-Printable 编码中表示他自己,所以当
我们在对一个 multipart 实体的主体内容编码的时候,我们必须注
意:我们决不能让一个 boundary 标志符出现在编码的主体部分!
(一个比较好的办法是在 boundary 中包含一个 "=_ ",这样就决不会重复
了,具体情况清查阅 RFC 1341 中的 multipart message 的定义部分。)
注意:采用 Quoted-Printable 编码是邮件的传输过程中,对于易读性
和可靠性折衷的一种编码。对于使用 Quoted-Printable 编码的邮件主
体,绝大多数邮件网关(mail gateway)都能够可靠的工作,但是也可能
在极少的邮件网关上工作的并不十分好,最显著的莫过于涉及到那些
EBCDIC 的传输的时候。(理论上来说, EBCDIC 网关能够对 Quoted-Pintable
编码进行解码,然后使用 Base64 编码来重新对主体内容进行编码,但是
这些网关在实际中还没有出现呢。)
对于更高的要求,我们使用 Base64 编码。一种适度可信的传输通过
EBCDIC 网关的方法就是依照 [规则 1] 引用如下的 ASCII 码:
! "#$@[\]^`{|}~
更多信息请查看 RFC1341 的 [附录 B]。
由于被 Quoted-Printable 编码的数据通常被认为是行导向的(line-oriented),
对于使用 Quoted-Printable 编码的数据我们希望行与行之间换行符在传输中被
改写(译者注:由于不同的系统 unix, windows, mac得换行符不同),同样的,我
们希望一封普通文本文件内容的邮件(plain text mail)可以在不同的系统中转换
成不同换行符的互联网邮件(Internet mail)。如果这种转换可能导致原始数据大
量变化(a corruption of the data),那么比较明智的选择是应用 base64 编码,
来替换 Quoted-Printable 编码!
5.1 Quoted-Printable Content-Transfer-Encoding
The Quoted-Printable encoding is intended to represent data
that largely consists of octets that correspond to printable
characters in the ASCII character set. It encodes the data
in such a way that the resulting octets are unlikely to be
modified by mail transport. If the data being encoded are
mostly ASCII text, the encoded form of the data remains
largely recognizable by humans. A body which is entirely
ASCII may also be encoded in Quoted-Printable to ensure the
integrity of the data should the message pass through a
character-translating, and/or line-wrapping gateway.
In this encoding, octets are to be represented as determined
by the following rules:
Rule #1: (General 8-bit representation) Any octet,
except those indicating a line break according to the
newline convention of the canonical form of the data
being encoded, may be represented by an "=" followed by
a two digit hexadecimal representation of the octet's
value. The digits of the hexadecimal alphabet, for this
purpose, are "0123456789ABCDEF". Uppercase letters must
be
used when sending hexadecimal data, though a robust
implementation may choose to recognize lowercase
letters on receipt. Thus, for example, the value 12
(ASCII form feed) can be represented by "=0C", and the
value 61 (ASCII EQUAL SIGN) can be represented by
"=3D". Except when the following rules allow an
alternative encoding, this rule is mandatory.
Rule #2: (Literal representation) Octets with decimal
values of 33 through 60 inclusive, and 62 through 126,
inclusive, MAY be represented as the ASCII characters
which correspond to those octets (EXCLAMATION POINT
through LESS THAN, and GREATER THAN through TILDE,
respectively).
Rule #3: (White Space): Octets with values of 9 and 32
MAY be represented as ASCII TAB (HT) and SPACE
characters, respectively, but MUST NOT be so
Borenstein & Freed [Page 14]
RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
represented at the end of an encoded line. Any TAB (HT)
or SPACE characters on an encoded line MUST thus be
followed on that line by a printable character. In
particular, an "=" at the end of an encoded line,
indicating a soft line break (see rule #5) may follow
one or more TAB (HT) or SPACE characters. It follows
that an octet with value 9 or 32 appearing at the end
of an encoded line must be represented according to
Rule #1. This rule is necessary because some MTAs
(Message Transport Agents, programs which transport
messages from one user to another, or perform a part of
such transfers) are known to pad lines of text with
SPACEs, and others are known to remove "white space"
characters from the end of a line. Therefore, when
decoding a Quoted-Printable body, any trailing white
space on a line must be deleted, as it will necessarily
have been added by intermediate transport agents.
Rule #4 (Line Breaks): A line break in a text body
part, independent of what its representation is
following the canonical representation of the data
being encoded, must be represented by a (RFC 822) line
break, which is a CRLF sequence, in the Quoted-
Printable encoding. If isolated CRs and LFs, or LF CR
and CR LF sequences are allowed to appear in binary
data according to the canonical form, they must be
represented using the "=0D", "=0A", "=0A=0D" and
"=0D=0A" notations respectively.
Note that many implementation may elect to encode the
local representation of various content types directly.
In particular, this may apply to plain text material on
systems that use newline conventions other than CRLF
delimiters. Such an implementation is permissible, but
the generation of line breaks must be generalized to
account for the case where alternate representations of
newline sequences are used.
Rule #5 (Soft Line Breaks): The Quoted-Printable
encoding REQUIRES that encoded lines be no more than 76
characters long. If longer lines are to be encoded with
the Quoted-Printable encoding, 'soft' line breaks must
be used. An equal sign as the last character on a
encoded line indicates such a non-significant ('soft')
line break in the encoded text. Thus if the "raw" form
of the line is a single unencoded line that says:
Now's the time for all folk to come to the aid of
their country.
This can be represented, in the Quoted-Printable
encoding, as
Borenstein & Freed [Page 15]
RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992
Now's the time =
for all folk to come=
to the aid of their country.
This provides a mechanism with which long lines are
encoded in such a way as to be restored by the user
agent. The 76 character limit does not count the
trailing CRLF, but counts all other characters,
including any equal signs.
Since the hyphen character ("-") is represented as itself in
the Quoted-Printable encoding, care must be taken, when
encapsulating a quoted-printable encoded body in a multipart
entity, to ensure that the encapsulation boundary does not
appear anywhere in the encoded body. (A good strategy is to
choose a boundary that includes a character sequence such as
"=_" which can never appear in a quoted-printable body. See
the definition of multipart messages later in this
document.)
NOTE: The quoted-printable encoding represents something of
a compromise between readability and reliability in
transport. Bodies encoded with the quoted-printable
encoding will work reliably over most mail gateways, but may
not work perfectly over a few gateways, notably those
involving translation into EBCDIC. (In theory, an EBCDIC
gateway could decode a quoted-printable body and re-encode
it using base64, but such gateways do not yet exist.) A
higher level of confidence is offered by the base64
Content-Transfer-Encoding. A way to get reasonably reliable
transport through EBCDIC gateways is to also quote the ASCII
characters
!"#$@[\]^`{|}~
according to rule #1. See Appendix B for more information.
Because quoted-printable data is generally assumed to be
line-oriented, it is to be expected that the breaks between
the lines of quoted printable data may be altered in
transport, in the same manner that plain text mail has
always been altered in Internet mail when passing between
systems with differing newline conventions. If such
alterations are likely to constitute a corruption of the
data, it is probably more sensible to use the base64
encoding rather than the quoted-printable encoding.
Borenstein & Freed [Page 16]
449

被折叠的 条评论
为什么被折叠?



