- 基本特征
一个表格型的数据结构;
含有一组有序的列(类似于index);
大致可看成共享同一个index的Series集合。
data = {'name':['Wangdachui', 'Linling', 'Niuyun'], 'pay':[4000,5000,6000]} # name、pay为列索引
frame = pd.DataFrame(data)
frame
Out[24]:
name pay
0 Wangdachui 4000
1 Linling 5000
2 Niuyun 6000
- DataFrame的索引和值
data = np.array([('Wangdachui',4000), ('Linling',5000), ('Niuyun',6000)])
frame = pd.DataFrame(data, index=range(1,4), columns=['name','pay'])
frame
Out[27]:
name pay
1 Wangdachui 4000
2 Linling 5000
3 Niuyun 6000
frame.index # 行索引
Out[28]: RangeIndex(start=1, stop=4, step=1)
frame.columns # 列索引
Out[29]: Index(['name', 'pay'], dtype='object')
frame.values # 值
Out[30]:
array([['Wangdachui', '4000'],
['Linling', '5000'],
['Niuyun', '6000']], dtype=object)
- DataFrame的基本操作
o. 取DataFrame对象的列和行可获得Series:
frame
Out[27]:
name pay
1 Wangdachui 4000
2 Linling 5000
3 Niuyun 6000
frame['name'] # 方式一
Out[31]:
1 Wangdachui
2 Linling
3 Niuyun
Name: name, dtype: object
frame.pay # 方式二
Out[32]:
1 4000
2 5000
3 6000
Name: pay, dtype: object
frame.iloc[:2, 1] # 第一维表示行,第二维表示列 >第0行、第1行的第1列
Out[33]:
1 4000
2 5000
Name: pay, dtype: object
o. DataFrame对象的修改和删除
frame['name'] = 'admin' # 修改
frame
Out[35]:
name pay
1 admin 4000
2 admin 5000
3 admin 6000
del frame['pay'] # 删除
frame
Out[37]:
name
1 admin
2 admin
3 admin
- DataFrame的统计功能
o. DataFrame对象成员找最低工资和高工资人群信息
import pandas as pd
data = {'name':['Wangdachui','Linling','Niuyun'], 'pay':[4000,5000,6000]}
frame = pd.DataFrame(data)
frame
Out[4]:
name pay
0 Wangdachui 4000
1 Linling 5000
2 Niuyun 6000
frame[frame.pay >= 5000]
Out[6]:
name pay
1 Linling 5000
2 Niuyun 6000