1. 读取无BOM的UTF-8编码文件,open方法传入参数:encoding = 'utf-8'
2. 读取有BOM的UTF-8编码文件,open方法传入参数:encoding = 'utf-8-sig'
3. 读取无BOM的gbk编码文件,open方法传入参数:encoding = 'gbk'
万金油方法:
bytes = min(32, os.path.getsize(filename))
raw = open(filename, 'rb').read(bytes)
result = chardet.detect(raw)
encoding = result['encoding']
infile = open(filename, mode, encoding=encoding)
data = infile.read()
infile.close()
print(data)
参考资料:
Reading Unicode file data with BOM chars in Python
http://stackoverflow.com/questions/13590749/reading-unicode-file-data-with-bom-chars-in-python#comment18629764_13591421
在Python的API文档里有详细介绍:
All of these encodings can only encode 256 of the 1114112 codepoints defined in Unicode. A simple and straightforward way tha

本文介绍了Python 3中如何读取不同编码的文本文件,包括无BOM的UTF-8、有BOM的UTF-8(使用'utf-8-sig')以及GBK编码的文件。还讨论了UTF-32编码、BOM的作用以及UTF-8编码的特性,指出在UTF-8中BOM通常是可选的,并且utf-8-sig编码用于提高识别概率。
最低0.47元/天 解锁文章
581

被折叠的 条评论
为什么被折叠?



