避免读取.csv 文件,格式报错的统一方法

最新推荐文章于 2025-05-17 17:07:22 发布

算法小菜鸟成长心得

最新推荐文章于 2025-05-17 17:07:22 发布

阅读量249

点赞数 4

文章标签： python 开发语言

本文链接：https://blog.youkuaiyun.com/qq_39865117/article/details/146914549

版权

问题：

大家在使用pd.read_csv()读取csv 文件时，是否遇到过

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 11: invalid continuation byte

原因：

这是由于我们的csv 文件格式不确定导致的，有的时候是gbk 格式，有的时候是utf-8格式，

解决方式：

因此为避免在读取时，报错，或者乱码，我们可以在读取文件之前，先对文件格式进行检测，然后将检测的格式带入到pd.read_csv(file,encoding = '检测的格式')

完整代码：

import chardet

def read_csv_with_encodings(file_path):
    with open(file_path, 'rb') as f:
        result = chardet.detect(f.read())
        encoding = result['encoding']
        print(f"文件编码: {encoding}")
    
    df = pd.read_csv(file_path , encoding=encoding)