《转》Python学习(13)-Python的字符编码

最新推荐文章于 2020-12-18 03:37:18 发布

转载最新推荐文章于 2020-12-18 03:37:18 发布 · 62 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：http://www.cnblogs.com/nolonely/p/6626162.html

文章标签：

#python #数据库

本文详细解释了ASCII、Unicode和UTF-8编码的区别，并介绍了Python中如何处理这两种编码方式。此外，还给出了编码和解码的实际应用案例。

转自 http://www.cnblogs.com/BeginMan/p/3166363.html

一、字符编码中ASCII、Unicode和UTF-8的区别

点击阅读：http://www.cnblogs.com/kingstarspe/p/ASCII.html

二、Unicode与ASCII

Python能处理Unicode和ASCII编码，为了让这两者看起来尽可能的相似，Python字符串从原来简单的类型改成了真正的对象。ASCII字符串成了StringType、Unicode字符串成了UnicodeType。使用如下：

>>> "hello world"    #ASCII string
'hello world' >>> u"hello world" #Unicode string u'hello world' >>>

1、str()、chr()只能以0~255作为参数，也即是说只处理ASCII字符串。如果有Unicode字符串，则会先自动转换成ASCII的然后在传入这些函数中。

原因：Unicode支持的字符多，如果在str()、chr()中有ASCII不存在的字符，则会发生异常。

2、unicode()、unichar()可以看做是Unicode版本的str()和chr()。

>>> unicode('hello world')
u'hello world'

三、编码与解码

它们解决的问题就是编码(encode())、解码(decode())问题，不至于出现乱码。

Codec表示编码方式。

""" 把一个Unicode字符串写入到磁盘文件，然后再把它读出并显示;
    写入的时候用UTF-8，读也一样用UTF-8。""" CODEC = 'utf-8' FILE = 'demo.txt' strIn = u'BeginMan will be a great coder' byte_strIn = strIn.encode(CODEC) #以uft-8进行编码 f = open(FILE,'w') f.write(byte_strIn) f.close() f = open(FILE,'r') str = f.read() f.close() str_out = str.decode(CODEC) #以utf-8进行解码 print str_out #输出：BeginMan will be a great coder

注意：

1、程序中出现字符串时一定要在前面加上前缀u

s= '博客园Cnblog'  #不要这样写,这样容易乱码如：鍗氬鍥瑿nblog
s = u'博客园Cnblog'#正确

2、不要使用str()函数，尽量用unicode()代替

3、不要使用过时的string 模块

4、没必要在程序中编码或解码unicode字符串，编码解码一般用于操作文件、数据库、网络等才使用。

5、字符串格式化

>>> '%s %s' %('Begin','man') 'Begin man' #还记得上次的关于字符串的博客中说到的：“普通字符串与unicode字符串能转换成unicode字符串” >>> u'%s %s' %(u'Begin',u'Man') u'Begin Man' >>> u'%s %s' %('Begin','Man') u'Begin Man' >>> '%s %s' %(u'Begin','man') u'Begin man' >>> '%s %s' %('Begin',u'man') u'Begin man'