一般www_urlencode 因为有些不安全的字符所以需要编码
RFC 1738指明了统一资源定位(URLs)中的字符应该是US-ASCII字符集的子集。这是受HTML的限制,另一方面,允许在文档中使用所有ISO-8859-1(ISO-Latin)字符集。这将意味着在HTML FORM里POST的数据(或作为查询字串的一部分),所有HTML编码必须被编码。
ISO-8859-1 (ISO-Latin)字符集
在下表中,包含了完整的ISO-8859-1 (ISO-Latin)字符集,表格提供了每个字符范围(10进制),描述,实际值,十六进制值,HTML结果。某个范围中的字符是否安全。
| Character range(decimal) | Type | Values | Safe/Unsafe |
| 0-31 | ASCII Control Characters | These characters are not printable | Unsafe |
| 32-47 | Reserved Characters | '' ''!?#$%&''()*+,-./ | Unsafe |
| 48-57 | ASCII Characters and Numbers | 0-9 | Safe |
| 58-64 | Reserved Characters | :;<=>?@ | Unsafe |
| 65-90 | ASCII Characters | A-Z | Safe |
| 91-96 | Reserved Characters | [/]^_` | Unsafe |
| 97-122 | ASCII Characters | a-z | Safe |
| 123-126 | Reserved Characters | {|}~ | Unsafe |
| 127 | Control Characters | '' '' | Unsafe |
| 128-255 | Non-ASCII Characters | '' '' | Unsafe |
所有不安全的ASCII字符都需要编码,例如,范围(32-47, 58-64, 91-96, 123-126)。
下表描述了这些字符为什么不安全。
| Character | Unsafe Reason | Character Encode |
| "<" | Delimiters around URLs in free text | %3C |
| > | Delimiters around URLs in free text | %3E |
| . | Delimits URLs in some systems | %22 |
| # | It is used in the World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. | %23 |
| { | Gateways and other transport agents are known to sometimes modify such characters | %7B |
| } | Gateways and other transport agents are known to sometimes modify such characters | %7D |
| | | Gateways and other transport agents are known to sometimes modify such characters | %7C |
| / | Gateways and other transport agents are known to sometimes modify such characters | %5C |
| ^ | Gateways and other transport agents are known to sometimes modify such characters | %5E |
| ~ | Gateways and other transport agents are known to sometimes modify such characters | %7E |
| [ | Gateways and other transport agents are known to sometimes modify such characters | %5B |
| ] | Gateways and other transport agents are known to sometimes modify such characters | %5D |
| ` | Gateways and other transport agents are known to sometimes modify such characters | %60 |
| + | Indicates a space (spaces cannot be used in a URL) | %20 |
| / | Separates directories and subdirectories | %2F |
| ? | Separates the actual URL and the parameters | %3F |
| & | Separator between parameters specified in the URL | %26 |
如何实现
字符的URL编码是将字符转换到8位16进制并在前面加上''%''前缀。例如,US-ASCII字符集中空格是10进制
的32或16进制的20,因此,URL编码是%20。
文章出处:http://hi.baidu.com/java_ruby/blog/item/38170c17ffa037014b90a76b.html
本文详细介绍了URL编码的概念及其重要性,并列举了需要进行URL编码的不安全字符及其原因。同时,文章还提供了这些字符对应的编码形式。
253

被折叠的 条评论
为什么被折叠?



