python3编码

本文尚需完善,准确性不保证,请谨慎阅读。
看了各种blog解析,不如查一下文档
首先,本文针对的python3,相对于python2有一定的改动。

  • unicode

    Renames unicode to str

  • str

    class str(object=b”, encoding=’utf-8’, errors=’strict’)

    返回一个指代object的字符串。encoding哈errors为None时,尝试返回object.str(),object.str()为None时返回repr(object);否则,object应该是bytes-like的,此时返回object.decode(encoding,errors)

  • Bytes

    bytes对象是不可变的字节序列,因此每位须在[0,255]之间。bytes只支持ascii码,其中[128,155]之间的值需要进行转义。

  • Encodings and Unicode

    Strings are stored internally as sequences of code points in range 0x0-0x10FFFF. (See PEP 393 for more details about the implementation.) Once a string object is used outside of CPU and memory, endianness and how these arrays are stored as bytes become an issue. As with other codecs, serialising a string into a sequence of bytes is known as encoding, and recreating the string from the sequence of bytes is known as decoding.

    str在内存中以code points形式存在,而存储时需要转成bytes,因此需要encode和decode。
    ps.而采用unicode字符集的str支持多种编码。utf-16和utf-32由于采用双字节和四字节编码,存在大小端(big endian or in little endian order)的问题, 可用BOM(‘Byte Order Mark’)解决。utf-8采用单字节编码,由于特殊的编码体系,可以直接判断大端还是小端,而不需BOM。

  • I/O

    There are three main types of I/O: text I/O, binary I/O and raw I/O. These are generic categories, and various backing stores can be used for each of them. A concrete object belonging to any of these categories is called a file object. Other common terms are stream and file-like object
    …..
    All streams are careful about the type of data you give to them. For example giving a str object to the write() method of a binary stream will raise a TypeError. So will giving a bytes object to the write() method of a text stream.

    • Text I/O
      对text stream输入输出str。可采用encoding参数指定编码,否则采用locale.getpreferredencoding(False)作为默认编码。

    • Binary I/O

      Binary I/O (also called buffered I/O) expects and produces bytes objects.

      无需编/解码

    • Raw I/O

      Raw I/O (also called unbuffered I/O) is generally used as a low-level building-block for binary and text streams;

  • os.open()

    open(file, mode=’r’, buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
    Open file and return a corresponding file object.

    根据打开方式返回text stream或者bytes stream。若是text模式,若无encoding为None则使用locale.getpreferredencoding(False)

  • print()

    print(*objects, sep=’ ‘, end=’\n’, file=sys.stdout, flush=False)
    Print objects to the text stream file….
    All non-keyword arguments are converted to strings like str() does and written to the stream……

    print()会将str(object)输出到file。

  • sys.getdefaultencoding()

    Return the name of the current default string encoding used by the Unicode implementation.
    默认编码为utf-8

  • Encoding declarations

    the first group of this expression names the encoding of the source code file
    源代码在保存的时候会使用各种编码,比如gbk。如果没有这个声明,python解释器并不知道源代码采用什么编码,于是默认为采用了utf-8编码。这样便容易导致错误。

文件编码:是将文件显示的字符串encode成bytes的存储到硬盘上采用的编码。
说了这么多,其实主要想理清乱码问题的产生原因。以文件读写为例。我们以open(encoding = A, mode = ‘t’)打开一个文件获得一个text stream,为了正确读写我们需要采用正确的编码。假设文件采用了B编码,

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值