参考文档:https://blog.youkuaiyun.com/lqzdreamer/article/details/76549256
今天,练习一个通过读取txt文本信息(英文版的Walden.txt),统计文本中的英文单词词频。在读取Walden.txt文本时,出现了“UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal multibyte sequence”错误提示。
参考文档修改后的代码
主要还是文字解码的问题
path = 'C:\\Users\\yao\\Desktop\\Walden.txt'
file = open(path,encoding='gb18030',errors='ignore') #
file2 =open('C:\\Users\\yao\\Desktop\\Waldenpython2.txt','w')
file2.write(file.read())
file.close()
file2.close()
with open('C:\\Users\\yao\\Desktop\\Waldenpython2.txt','r') as text:
words = text.read().split()
print(words)
for word in words:
print('{}-{} time'.format(word,words.count(word)))