AFAIK, the Python (v2.6) csv module can't handle unicode data by default, correct? In the Python docs there's an example on how to read from a UTF-8 encoded file. But this example only returns the CSV rows as a list.
I'd like to access the row columns by name as it is done by csv.DictReader but with UTF-8 encoded CSV input file.
Can anyone tell me how to do this in an efficient way? I will have to process CSV files in 100's of MByte in size.
解决方案
Actually, I came up with an answer myself (sorry for replying to my own question):
def UnicodeDictReader(utf8_data, **kwargs):
csv_reader = csv.DictReader(utf8_data, **kwargs)
for row in csv_reader:
yield {unicode(key, 'utf-8'):unicode(value, 'utf-8') for key, value in row.iteritems()}
本文介绍了解决Python 2.6 csv模块无法处理Unicode数据的问题,提供了一个自定义的UnicodeDictReader函数,用于从UTF-8编码的CSV文件中按名称获取列值。适合处理大文件(MB级别)的CSV操作。
466

被折叠的 条评论
为什么被折叠?



