python编码笔记

最新推荐文章于 2025-04-17 23:36:20 发布

原创最新推荐文章于 2025-04-17 23:36:20 发布 · 3.1w 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#编码

python 专栏收录该内容

4 篇文章

订阅专栏

本文深入探讨了Python中字符串与Unicode编码之间的转换问题，通过实例演示了如何使用decode方法正确地将不同格式的字符串转换为Unicode对象，特别是针对GBK和UTF-8等常见编码格式。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

编码是最痛苦的事情,python的decode 好像原意是decode ToUnicodeFormat from original Format ,也就是说decode的作用是将str对象变成unicode对象，原来的str是original Format 编码。

这人写的不错，有时间读读 http://wklken.me/posts/2013/08/31/python-extra-coding-intro.html

C:\Users\lucifer\Desktop\collection>python
Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> s="中文"
>>> sa=u"中文"
>>> type(s)
<type 'str'>
>>> type(sa)
<type 'unicode'>
>>> print s
中文
>>> print sa
中文
>>> s
'\xd6\xd0\xce\xc4'
>>> sa
u'\u4e2d\u6587'
>>> c=s.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\Lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd6 in position 0: invalid c
ontinuation byte
>>> c=s.decode('gbk')
>>> c
u'\u4e2d\u6587'
>>> c=s.decode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 0: ordinal
not in range(128)
>>> c=s.decode('ASCII')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 0: ordinal
not in range(128)
>>> c=s.decode('GB')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding: GB
>>> c=s.decode('gb')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding: gb
>>> c=s.decode('gb2312')
>>> c
u'\u4e2d\u6587'
>>>
^C