读取一个表格数据
数据格式如:
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
1::2355::5::978824291
1::1197::3::978302268
1::1287::5::978302039
1::2804::5::978300719
1::594::4::978302268
1::919::4::978301368
rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings = pd.read_table('ml-1m/ratings.dat', sep='::', header=None, names=rnames, engine='python')
rnames为表格数据的列名,sep为分隔符
筛选数据行
ratings[:5]
横向合并数据表
pd.merge(ratings, users)
聚合数据
data.pivot_table('rating',index='title', columns='gender', aggfunc='mean')
将data中的,相同title的行中的rating字段,分别计算不同gender的mean值,mean为平均值(std为标准差)
分类记数
data.groupby('title').size()
筛选数据
ratings_index = ratings_by_title.index[ratings_by_title >= 250]//得到下标值
mean_data = mean_data.ix[ratings_index]//按下标得到新的数据
数据排序
mean_data.sort_values(by='F', ascending=False)
by为指定列名
计算出新数据列
mean_data['diff'] = mean_data['M'] - mean_data['F']
数据反序
sort_by_diff[::-1]
分类求和
names1880.groupby('sex').births.sum()
以sex列分类后, 将births字段数据求和.
读取csv
Mary,F,7065
Anna,F,2604
Emma,F,2003
Elizabeth,F,1939
Minnie,F,1746
Margaret,F,1578
Ida,F,1472
Alice,F,1414
Bertha,F,1320
Sarah,F,1288
names1880 = pd.read_csv('./yob1880.txt', names=['name', 'sex', 'births'])
names指定数据的列名
纵向合并数据
pieces = []
pieces.append(names1880)
names1881 = pd.read_csv('./yob1881.txt', names=['name', 'sex', 'births'])
names1881['year']=1881
pieces.append(names1881)
names = pd.concat(pieces,ignore_index=True)
转换数据为浮点数
用于整数的除法(python3不用)
births.astype(float)