import pandas as pd
csvfile = open('text.csv',encoding='utf-8')
df = pd.read_csv(csvfile,engine='python')
# 按行读取保存到字典里,假设每行有三个字段,item_id,info,title
dict_item_id = {}
dict_info = {}
dict_title = {}
dict_item_id_reverse = {}
for i in range(len(df)):
dict_item_id[i] = df["item_id"][i]
dict_info[i] = df["info"][i]
dict_title[i] = df["title"][i]
dict_item_id_reverse[df["item_id"][i]] = i
通过字典的key i 构建了item_id,info,title字段的关联,方便后续数据的处理。
目的是分别提取出每行每个字段下面的数据。
1、pandas.read_csv()函数,读取文件数据时,由于分隔符为'::',弹出如下警告
警告:ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex)
解决方法:增加函数的引擎参数engine='python',如下:
header = ['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv("D:/ratings.dat", sep='::', names=header,engine='python')