UTF-8和GBK

最新推荐文章于 2025-11-26 15:40:52 发布

原创最新推荐文章于 2025-11-26 15:40:52 发布 · 382 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#python

文章讲述了作者在使用Python进行文件拷贝时遇到UnicodeDecodeError，由于源文件由powershell以GBK编码创建，而Python脚本默认使用UTF-8。作者解决了这个问题的方法是确保使用编码一致的工具（如ATOM）创建文件，避免修改系统默认编码。

学习Python中，做个小练习碰到一些编码问题，然后尝试去解决。
先看代码（来源： Learn PYTHON 3 the HARD WAY）：

from sys import argv
from os.path import exists

script, from_file, to_file = argv

print(f"Copying from {from_file} to {to_file}.")

in_file = open(from_file)
indata = in_file.read()

print(f"The input file is {len(indata)} bytes long.")

print(f"Does the output file exist? {exists(to_file)}")
print("Ready, hit RETURN to continue, CTRL+C to abort.")

input()

out_file = open(to_file, 'w')
out_file.write(indata)

print("Alright, all done.")

out_file.close()
in_file.close()

代码很简单，就是文件拷贝。运行时用的是powershell，代码如下：

echo “This is a test file.” > test.txt
cat test.txt
python ex17.py test.txt new-file.txt

然后就报错了。

UnicodeDecodeError： ‘gbk’ codec can't decode byte 0xff in position 0: illegal multibyte sequence.

错误是说gbk无法对0xff解码。作为菜鸡的我不知道这些是什么意思，但看到unicode gbk这些就想到了编码和解码。搜了一下资料，win10里powershell的编码方式是GB2312，而一般编程中的编码方式是UTF-8。
在powershell中使用echo创建文件的时候，编码方式一定是GB2312之类的，而在写python脚本的时候，我用的ATOM应该是别的编码方式。这也就是导致报错的根源。
如何处理这个问题呢？
一种是可以在写代码的时候加入encoding = XXX来限制，我投机取巧，既然使用powershell创建的文档编码方式有问题，那我直接用ATOM创建就行了呗。我也不想去修改powershell的默认编码方式，ATOM的修改之后会不会带来其他问题我也不知道，安全第一。
在这里插入图片描述