QP编码

最新推荐文章于 2025-07-05 20:59:37 发布

转载最新推荐文章于 2025-07-05 20:59:37 发布 · 2.3k 阅读

16 篇文章

订阅专栏

Quoted-Printable   加码规则(RFC   1341):

1.   字符用   =XX   形式表示，其中   XX   是该字符的十六进制值，
必须为   0-9   或者   A-F   （使用大写字符）,除非有可替换说明，
否则，此原则是强制性的。

2.   其中，十进制值   33-60   &   62-126(注意:   即不包含   '= '   )
可以作为标准   ASCII   从而不进行转换。

3.   另外，十进制值   9-32   也可以作为制表和格式控制字符，
从而不进行转换。(注意，这个不是必须执行的，即也可以转换)

4.   由于在   RFC822   协议中规定主体   body   文本中各行均有最大字
符限制，因此，当主体文本中出现   CRLF   或者   LFCR   字符序列，
或者单独的   CR   以及   LF   字符的时候，必须转换成对应的
"=0D=0A ", "=0A=0D ", "=0D ", "=0A "   等编码来表示。

5.   (关于软回车的问题)   Quoted-Printable   编码要求编码后每行
最大字符数量不得超过   76   个字符。如果对大于该字符数量的行进
行编码，则必须使用软回车。所以，对于某个以编码行的最后加上
'= '符号，则表示最后这个   '= '   是一个无意义的软回车。所以，如
果一个尚未编码的行的内容如下的话:

Now 's   the   time   for   all   folk   to   come   to   the   aid   of   their   country.

那么在   Quoted-Printable   中可以表示为:

Now 's   the   time   =
for   all   folk   to   come=
to   the   aid   of   their   country.

他提供了一种对过长的行进行编码并恢复到用户原来的输入内容的
机制。虽然一行的末尾的   CRLF   不计入   76   个字符的限制之中，但
是所有的其他字符，包括   '= '   符号都将被计算在内。

由于连字符号   '- '   在   Quoted-Printable   编码中表示他自己，所以当
我们在对一个   multipart   实体的主体内容编码的时候，我们必须注
意：我们决不能让一个   boundary   标志符出现在编码的主体部分！
(一个比较好的办法是在   boundary   中包含一个 "=_ ",这样就决不会重复
了，具体情况清查阅   RFC   1341   中的   multipart   message   的定义部分。)

注意：采用   Quoted-Printable   编码是邮件的传输过程中，对于易读性
和可靠性折衷的一种编码。对于使用   Quoted-Printable   编码的邮件主
体，绝大多数邮件网关(mail   gateway)都能够可靠的工作，但是也可能
在极少的邮件网关上工作的并不十分好，最显著的莫过于涉及到那些
EBCDIC   的传输的时候。(理论上来说，   EBCDIC   网关能够对   Quoted-Pintable
编码进行解码，然后使用   Base64   编码来重新对主体内容进行编码，但是
这些网关在实际中还没有出现呢。)
对于更高的要求，我们使用   Base64   编码。一种适度可信的传输通过
EBCDIC   网关的方法就是依照   [规则   1]   引用如下的   ASCII   码：

! "#$@[\]^`{|}~

更多信息请查看   RFC1341   的   [附录   B]。

由于被   Quoted-Printable   编码的数据通常被认为是行导向的(line-oriented)，
对于使用   Quoted-Printable   编码的数据我们希望行与行之间换行符在传输中被
改写(译者注：由于不同的系统   unix,   windows,   mac得换行符不同)，同样的，我
们希望一封普通文本文件内容的邮件(plain   text   mail)可以在不同的系统中转换
成不同换行符的互联网邮件(Internet   mail)。如果这种转换可能导致原始数据大
量变化(a   corruption   of   the   data)，那么比较明智的选择是应用   base64   编码，
来替换   Quoted-Printable   编码！

5.1  Quoted-Printable Content-Transfer-Encoding

            The Quoted-Printable encoding is intended to represent  data
            that largely consists of octets that correspond to printable
            characters in the ASCII character set.  It encodes the  data
            in  such  a way that the resulting octets are unlikely to be
            modified by mail transport.  If the data being  encoded  are
            mostly  ASCII  text,  the  encoded  form of the data remains
            largely recognizable by humans.  A body  which  is  entirely
            ASCII  may also be encoded in Quoted-Printable to ensure the
            integrity of the data should  the  message  pass  through  a
            character-translating, and/or line-wrapping gateway.

            In this encoding, octets are to be represented as determined
            by the following rules:

                 Rule #1:  (General  8-bit  representation)  Any  octet,
                 except  those  indicating a line break according to the
                 newline convention of the canonical form  of  the  data
                 being encoded, may be represented by an "=" followed by
                 a two digit hexadecimal representation of  the  octet's
                 value. The digits of the hexadecimal alphabet, for this
                 purpose, are "0123456789ABCDEF". Uppercase letters must
                 be
                 used when sending hexadecimal  data,  though  a  robust
                 implementation   may   choose  to  recognize  lowercase
                 letters on receipt. Thus, for  example,  the  value  12
                 (ASCII  form feed) can be represented by "=0C", and the
                 value 61 (ASCII  EQUAL  SIGN)  can  be  represented  by
                 "=3D".   Except  when  the  following  rules  allow  an
                 alternative encoding, this rule is mandatory.

                 Rule #2: (Literal representation) Octets  with  decimal
                 values  of 33 through 60 inclusive, and 62 through 126,
                 inclusive, MAY be represented as the  ASCII  characters
                 which  correspond  to  those  octets (EXCLAMATION POINT
                 through LESS THAN,  and  GREATER  THAN  through  TILDE,
                 respectively).

                 Rule #3: (White Space): Octets with values of 9 and  32
                 MAY   be  represented  as  ASCII  TAB  (HT)  and  SPACE
                 characters,  respectively,   but   MUST   NOT   be   so



            Borenstein & Freed                                 [Page 14]




            RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


                 represented at the end of an encoded line. Any TAB (HT)
                 or SPACE characters on an encoded  line  MUST  thus  be
                 followed  on  that  line  by a printable character.  In
                 particular, an "=" at  the  end  of  an  encoded  line,
                 indicating  a  soft line break (see rule #5) may follow
                 one or more TAB (HT) or SPACE characters.   It  follows
                 that  an  octet with value 9 or 32 appearing at the end
                 of an encoded line must  be  represented  according  to
                 Rule  #1.  This  rule  is  necessary  because some MTAs
                 (Message Transport  Agents,  programs  which  transport
                 messages from one user to another, or perform a part of
                 such transfers) are known to pad  lines  of  text  with
                 SPACEs,  and  others  are known to remove "white space"
                 characters from the end  of  a  line.  Therefore,  when
                 decoding  a  Quoted-Printable  body, any trailing white
                 space on a line must be deleted, as it will necessarily
                 have been added by intermediate transport agents.

                 Rule #4 (Line Breaks): A line  break  in  a  text  body
                 part,   independent   of  what  its  representation  is
                 following the  canonical  representation  of  the  data
                 being  encoded, must be represented by a (RFC 822) line
                 break,  which  is  a  CRLF  sequence,  in  the  Quoted-
                 Printable  encoding.  If isolated CRs and LFs, or LF CR
                 and CR LF sequences are allowed  to  appear  in  binary
                 data  according  to  the  canonical  form, they must be
                 represented   using  the  "=0D",  "=0A",  "=0A=0D"  and
                 "=0D=0A" notations respectively.

                 Note that many implementation may elect to  encode  the
                 local representation of various content types directly.
                 In particular, this may apply to plain text material on
                 systems  that  use  newline conventions other than CRLF
                 delimiters. Such an implementation is permissible,  but
                 the  generation  of  line breaks must be generalized to
                 account for the case where alternate representations of
                 newline sequences are used.

                 Rule  #5  (Soft  Line  Breaks):  The   Quoted-Printable
                 encoding REQUIRES that encoded lines be no more than 76
                 characters long. If longer lines are to be encoded with
                 the  Quoted-Printable encoding, 'soft' line breaks must
                 be used. An equal sign  as  the  last  character  on  a
                 encoded  line indicates such a non-significant ('soft')
                 line break in the encoded text. Thus if the "raw"  form
                 of the line is a single unencoded line that says:

                      Now's the time for all folk to come to the aid of
                      their country.

                 This  can  be  represented,  in  the   Quoted-Printable
                 encoding, as





            Borenstein & Freed                                 [Page 15]




            RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992


                      Now's the time =
                      for all folk to come=
                       to the aid of their country.

                 This provides a mechanism with  which  long  lines  are
                 encoded  in  such  a  way as to be restored by the user
                 agent.  The 76  character  limit  does  not  count  the
                 trailing   CRLF,   but  counts  all  other  characters,
                 including any equal signs.

            Since the hyphen character ("-") is represented as itself in
            the  Quoted-Printable  encoding,  care  must  be taken, when
            encapsulating a quoted-printable encoded body in a multipart
            entity,  to  ensure that the encapsulation boundary does not
            appear anywhere in the encoded body.  (A good strategy is to
            choose a boundary that includes a character sequence such as
            "=_" which can never appear in a quoted-printable body.  See
            the   definition   of   multipart  messages  later  in  this
            document.)

            NOTE:  The quoted-printable encoding represents something of
            a   compromise   between   readability  and  reliability  in
            transport.   Bodies  encoded   with   the   quoted-printable
            encoding will work reliably over most mail gateways, but may
            not work  perfectly  over  a  few  gateways,  notably  those
            involving  translation  into  EBCDIC.  (In theory, an EBCDIC
            gateway could decode a quoted-printable body  and  re-encode
            it  using  base64,  but  such gateways do not yet exist.)  A
            higher  level  of  confidence  is  offered  by  the   base64
            Content-Transfer-Encoding.  A way to get reasonably reliable
            transport through EBCDIC gateways is to also quote the ASCII
            characters

                 !"#$@[\]^`{|}~

            according to rule #1.  See Appendix B for more information.

            Because quoted-printable data is  generally  assumed  to  be
            line-oriented,  it is to be expected that the breaks between
            the lines  of  quoted  printable  data  may  be  altered  in
            transport,  in  the  same  manner  that  plain text mail has
            always been altered in Internet mail  when  passing  between
            systems   with   differing  newline  conventions.   If  such
            alterations are likely to constitute  a  corruption  of  the
            data,  it  is  probably  more  sensible  to  use  the base64
            encoding rather than the quoted-printable encoding.











            Borenstein & Freed                                 [Page 16]