import os
import zipfile
import pandas as pd
import numpy as np
DATASET = 'ml-1m'
RAW_PATH = os.path.join('./', DATASET)
with zipfile.ZipFile(os.path.join(RAW_PATH, DATASET + '.zip')) as z:#已有名为ml-1m的zip文件
with z.open(os.path.join(DATASET, 'movies.dat')) as f:
data_df = pd.read_csv(f, sep=b'::', header=None, engine='python')#每行数据形为1::Toy Story (1995)::Animation|Children's|Comedy
最后两行代码有问题。
首先是KeyError: “There is no item named ‘ml-1m\\movies.dat’ in the archive”,可能是windows下路径得转换。解决:将open函数内替换为os.path.join(DATASET, ‘ratings.dat’).replace(‘\’,‘/’)。
然后是TypeError: cannot use a bytes pattern on a string-like object,解决:sep=‘::’,即不要b。
最后是UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xe9 in position 3114: invalid continuation byte,解决:read_csv添加参数encoding=‘ISO-8859-1’ 。