python的编码和解码

最新推荐文章于 2025-03-21 11:51:22 发布

千尔玉

最新推荐文章于 2025-03-21 11:51:22 发布

阅读量387

点赞数

文章标签： python 字符串 unicode

原文链接：https://www.cnblogs.com/1208xu/p/12045127.html

版权

字符串在Python内部的表示是unicode编码，因此，在做编码转换时，通常需要以unicode作为中间编码，即先将其他编码的字符串解码（decode）成unicode，再从unicode编码（encode）成另一种编码。
但是因为py3只有unicode,就不用再解码（decode）了，所以直接编码（encode）
参考自：https://www.cnblogs.com/1208xu/p/12045127.html
编码和解码过程是这样的：比如一串字符，最初以GBK编码格式存在文件中，我们想将其变成UTF-8编码。需要先用GBK编码将原始的二进制数翻译成字符，即由GBK编码向Unicode编码进行转换，这是解码过程；得到字符之后再去找这些字符在UTF-8编码下对应什么二进制数，这些二进制数就是我们要的结果，这是编码过程，由Unicode向UTF-8编码的转换。所以Unicode相当于一个中介，所有编码的相互转化都要经过它。
py3的字符串都是unicode编码

python3编码解码过程：
字符类型：str，默认编码：unicode编码，特点：人类可辨认的
|
| encode(‘utf-8’)，编码过程：将人类可识别的字符转换为机器可识别的字节码 / 字节序列
v
字符类型：bytes,编码：utf-8,特点：机器可识别的
|
|decode(‘utf-8’)以utf-8编码的规则解码，解码：编码的反过程
v
字符类型：str，默认编码：unicode编码，特点：人类可辨认的

from lxml import etree

xml='''
<div>
    <ul>
        <li class="item-0"><a href="www.baidu.com">baidu</a>
        <li class="item-1"><a href="https://blog.youkuaiyun.com/qq_25343557">myblog</a>
        <li class="item-2"><a href="https://www.youkuaiyun.com/">csdn</a>
        <li class="item-3"><a href="https://hao.360.cn/?a1004">神奇</a>

'''
print(type(xml))   #str
html=etree.HTML(xml)    #来解析字符串格式的HTML文档对象，将传进去的字符串转变成_Element对象
print(type(html))
result1=etree.tostring(html,encoding='utf-8')  #调用tostring()方法即可输出修正后的HTML代码，但是结果是bytes类型
#这里不指定编码encoding='utf-8'的话，中文会转换成一堆乱码
print(result1)
print(type(result1))        #bytes
print(type(result1.decode()))     #解码，不指定编码默认为utf-8,str
print(result1.decode())

python编码和解码可参考文档：https://zhuanlan.zhihu.com/p/38293267
https://www.cnblogs.com/OldJack/p/6658779.html