【Python开发web】(4) -- Python基础之中文字符_python web中文参数-优快云博客

本文通过Python代码演示了不同编码方式下中文字符的长度差异。详细对比了UTF-8、Unicode及GBK编码中字符串长度的不同，并解释了每种编码下中文字符所占字节数的原因。

上一篇提到了中文字符的长度，这篇主要就这点做了些测试，代码如下：

#!/usr/bin/python #-*- coding: utf-8 -*- s = "中国" ss = u"中国" print s, type(s), len(s) print ss, type(ss), len(ss) print '-' * 40 print repr(s) print repr(ss) print '-' * 40 s1 = s.decode('utf-8') print s1,len(s1),type(s1) print '-' * 40 s2 = s.decode('utf-8').encode('gbk') print s2 print type(s2) print len(s2) print '-' * 40 s3 = ss.encode('gbk') print s3 print type(s3) print len(s3)

执行结果如下：

中国 <type 'str'> 6 中国 <type 'unicode'> 2 ---------------------------------------- '/xe4/xb8/xad/xe5/x9b/xbd' u'/u4e2d/u56fd' ---------------------------------------- 中国 2 <type 'unicode'> ---------------------------------------- �й <type 'str'> 4 ---------------------------------------- �й <type 'str'> 4

补充：

查看python中默认编码设置：

>>> import sys >>> sys.getdefaultencoding() 'ascii'

由于在文件的头上已经指明了#-*- coding: utf-8 -*- ，则s的编码已是utf-8。

在utf-8下，英文字母占一个字节，中文占3个字节；

unicode下的中文是1个字符（双字节）；

GBK编码下的中文占2个字节。(感谢keakon的指正)