import numpy as np
import pandas as pd
df = pd.read_csv('C:/Users/admin/Desktop/joyful-pandas-master/joyful-pandas-master/data/table.csv',index_col = 'ID')
df.head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
1102 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
---|
1103 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
---|
1104 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
---|
1105 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
---|
一、单级索引
1.1、loc方法、iloc方法、[]操作符
- 常见的三类索引方法:
- iloc表示位置索引;
- loc表示标签索引;
- []也具有很大的便利性.
1.1.1、loc方法(注意:所有在loc中使用的切片全部包含右端点!)
1. 单行索引
df.loc[1103]
School S_1
Class C_1
Gender M
Address street_2
Height 186
Weight 82
Math 87.2
Physics B+
Name: 1103, dtype: object
2. 多行索引
df.loc[[1102,2304]]
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1102 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
---|
2304 | S_2 | C_3 | F | street_6 | 164 | 81 | 95.5 | A- |
---|
df.loc[1304:].head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1304 | S_1 | C_3 | M | street_2 | 195 | 70 | 85.2 | A |
---|
1305 | S_1 | C_3 | F | street_5 | 187 | 69 | 61.7 | B- |
---|
2101 | S_2 | C_1 | M | street_7 | 174 | 84 | 83.3 | C |
---|
2102 | S_2 | C_1 | F | street_6 | 161 | 61 | 50.6 | B+ |
---|
2103 | S_2 | C_1 | M | street_4 | 157 | 61 | 52.5 | B- |
---|
df.loc[2402::-1].head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
2402 | S_2 | C_4 | M | street_7 | 166 | 82 | 48.7 | B |
---|
2401 | S_2 | C_4 | F | street_2 | 192 | 62 | 45.3 | A |
---|
2305 | S_2 | C_3 | M | street_4 | 187 | 73 | 48.9 | B |
---|
2304 | S_2 | C_3 | F | street_6 | 164 | 81 | 95.5 | A- |
---|
2303 | S_2 | C_3 | F | street_7 | 190 | 99 | 65.9 | C |
---|
- python中双冒号“::”代表切片。X[a::b]代表从X序列的a(索引序号)开始取值,步长为b,若b为负值,则倒序切片
3. 单列索引
df.loc[:,'Height'].head()
ID
1101 173
1102 192
1103 186
1104 167
1105 159
Name: Height, dtype: int64
4. 多行索引
df.loc[:,['Height','Math']].head()
| Height | Math |
---|
ID | | |
---|
1101 | 173 | 34.0 |
---|
1102 | 192 | 32.5 |
---|
1103 | 186 | 87.2 |
---|
1104 | 167 | 80.4 |
---|
1105 | 159 | 84.8 |
---|
df.loc[:,'Height':'Math'].head()
| Height | Weight | Math |
---|
ID | | | |
---|
1101 | 173 | 63 | 34.0 |
---|
1102 | 192 | 73 | 32.5 |
---|
1103 | 186 | 82 | 87.2 |
---|
1104 | 167 | 81 | 80.4 |
---|
1105 | 159 | 64 | 84.8 |
---|
5. 联合索引
df.loc[1102:2401:3,'Height':'Math'].head()
| Height | Weight | Math |
---|
ID | | | |
---|
1102 | 192 | 73 | 32.5 |
---|
1105 | 159 | 64 | 84.8 |
---|
1203 | 160 | 53 | 58.8 |
---|
1301 | 161 | 68 | 31.5 |
---|
1304 | 195 | 70 | 85.2 |
---|
6. 函数式索引
df.loc[lambda x : x['Gender']=='M'].head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
1103 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
---|
1201 | S_1 | C_2 | M | street_5 | 188 | 68 | 97.0 | A- |
---|
1203 | S_1 | C_2 | M | street_6 | 160 | 53 | 58.8 | A+ |
---|
1301 | S_1 | C_3 | M | street_4 | 161 | 68 | 31.5 | B+ |
---|
def f(x):
return [1101,1103]
df.loc[f]
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
1103 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
---|
7. 布尔索引
df.loc[df['Address'].isin(['street_4','street_7'])].head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1105 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
---|
1202 | S_1 | C_2 | F | street_4 | 176 | 94 | 63.5 | B- |
---|
1301 | S_1 | C_3 | M | street_4 | 161 | 68 | 31.5 | B+ |
---|
1303 | S_1 | C_3 | M | street_7 | 188 | 82 | 49.7 | B |
---|
2101 | S_2 | C_1 | M | street_7 | 174 | 84 | 83.3 | C |
---|
df.loc[[True if i[-1] == '4' or i[-1] == '7' else False for i in df['Address'].values]].head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1105 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
---|
1202 | S_1 | C_2 | F | street_4 | 176 | 94 | 63.5 | B- |
---|
1301 | S_1 | C_3 | M | street_4 | 161 | 68 | 31.5 | B+ |
---|
1303 | S_1 | C_3 | M | street_7 | 188 | 82 | 49.7 | B |
---|
2101 | S_2 | C_1 | M | street_7 | 174 | 84 | 83.3 | C |
---|
1.1.2、iloc方法(注意与loc不同,切片右端点不包含)
1. 单列索引
df.iloc[3]
School S_1
Class C_1
Gender F
Address street_2
Height 167
Weight 81
Math 80.4
Physics B-
Name: 1104, dtype: object
2. 多行索引
df.iloc[3:5]
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1104 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
---|
1105 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
---|
3. 单列索引
df.iloc[:,3].head()
ID
1101 street_1
1102 street_2
1103 street_2
1104 street_2
1105 street_4
Name: Address, dtype: object
4. 多列索引
df.iloc[:,7::-2].head()
| Physics | Weight | Address | Class |
---|
ID | | | | |
---|
1101 | A+ | 63 | street_1 | C_1 |
---|
1102 | B+ | 73 | street_2 | C_1 |
---|
1103 | B+ | 82 | street_2 | C_1 |
---|
1104 | B- | 81 | street_2 | C_1 |
---|
1105 | B+ | 64 | street_4 | C_1 |
---|
5. 混合索引
df.iloc[3::4,7::-2].head()
| Physics | Weight | Address | Class |
---|
ID | | | | |
---|
1104 | B- | 81 | street_2 | C_1 |
---|
1203 | A+ | 53 | street_6 | C_2 |
---|
1302 | A- | 57 | street_1 | C_3 |
---|
2101 | C | 84 | street_7 | C_1 |
---|
2105 | A | 81 | street_4 | C_1 |
---|
小节:由上所述,iloc中接收的参数只能为整数或整数列表,不能使用布尔索引
1.1.3、[] 操作符
1.1.3.1、Series的[]操作
1. 单元素索引
s = pd.Series(df['Math'],index = df.index)
s[1101]
34.0
2. 多行索引
s[0:4]
ID
1101 34.0
1102 32.5
1103 87.2
1104 80.4
Name: Math, dtype: float64
3. 函数式索引
s[lambda x : x.index[16::-6]]
ID
2102 50.6
1301 31.5
1105 84.8
Name: Math, dtype: float64
4. 布尔式索引
s[s>80]
ID
1103 87.2
1104 80.4
1105 84.8
1201 97.0
1302 87.7
1304 85.2
2101 83.3
2205 85.4
2304 95.5
Name: Math, dtype: float64
1.1.3.2、DataFrame的[]操作
1. 单行索引
df[1:2]
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1102 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
---|
row = df.index.get_loc(1102)
df[row:row+1]
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1102 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
---|
2. 多行索引
df[3:5]
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1104 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
---|
1105 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
---|
df.loc[1104:1105]
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1104 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
---|
1105 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
---|
3. 多列索引
df[['School','Math']].head()
| School | Math |
---|
ID | | |
---|
1101 | S_1 | 34.0 |
---|
1102 | S_1 | 32.5 |
---|
1103 | S_1 | 87.2 |
---|
1104 | S_1 | 80.4 |
---|
1105 | S_1 | 84.8 |
---|
4. 函数式索引
df[lambda x :['Math','Physics']].head()
| Math | Physics |
---|
ID | | |
---|
1101 | 34.0 | A+ |
---|
1102 | 32.5 | B+ |
---|
1103 | 87.2 | B+ |
---|
1104 | 80.4 | B- |
---|
1105 | 84.8 | B+ |
---|
df[df['Gender'] == 'F'].head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1102 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
---|
1104 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
---|
1105 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
---|
1202 | S_1 | C_2 | F | street_4 | 176 | 94 | 63.5 | B- |
---|
1204 | S_1 | C_2 | F | street_5 | 162 | 63 | 33.8 | B |
---|
df[(df['Gender'] == 'F') & (df['Address'] == 'street_2')].head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1102 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
---|
1104 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
---|
2401 | S_2 | C_4 | F | street_2 | 192 | 62 | 45.3 | A |
---|
2404 | S_2 | C_4 | F | street_2 | 160 | 84 | 67.7 | B |
---|
df[(df['Math'] > 85) | (df['Address'] == 'street_7')].head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1103 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
---|
1201 | S_1 | C_2 | M | street_5 | 188 | 68 | 97.0 | A- |
---|
1302 | S_1 | C_3 | F | street_1 | 175 | 57 | 87.7 | A- |
---|
1303 | S_1 | C_3 | M | street_7 | 188 | 82 | 49.7 | B |
---|
1304 | S_1 | C_3 | M | street_2 | 195 | 70 | 85.2 | A |
---|
df[~((df['Math'] > 75) | (df['Address'] == 'street_1'))].head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1102 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
---|
1202 | S_1 | C_2 | F | street_4 | 176 | 94 | 63.5 | B- |
---|
1203 | S_1 | C_2 | M | street_6 | 160 | 53 | 58.8 | A+ |
---|
1204 | S_1 | C_2 | F | street_5 | 162 | 63 | 33.8 | B |
---|
1205 | S_1 | C_2 | F | street_6 | 167 | 63 | 68.4 | B- |
---|
df.loc[df['Math']>60,(df[:8]['Address'] == 'street_6').values].head()
| Physics |
---|
ID | |
---|
1103 | B+ |
---|
1104 | B- |
---|
1105 | B+ |
---|
1201 | A- |
---|
1202 | B- |
---|
df[df['Address'].isin(['street_1','street_4']) & df['Physics'].isin(['A','A+'])]
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
2105 | S_2 | C_1 | M | street_4 | 170 | 81 | 34.2 | A |
---|
2203 | S_2 | C_2 | M | street_4 | 155 | 91 | 73.8 | A+ |
---|
df[df[['Address','Physics']].isin({'Address':['street_1','street_4'],'Physics':['A','A+']}).all(1)]
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
2105 | S_2 | C_1 | M | street_4 | 170 | 81 | 34.2 | A |
---|
2203 | S_2 | C_2 | M | street_4 | 155 | 91 | 73.8 | A+ |
---|
tuples = [('A','a'),('A','b'),('B','a'),('B','b')]
mul_index = pd.MultiIndex.from_tuples(tuples,names = ('Upper','Lower'))
mul_index
MultiIndex([('A', 'a'),
('A', 'b'),
('B', 'a'),
('B', 'b')],
names=['Upper', 'Lower'])
pd.DataFrame({'Score':['perfect','good','fair','bad']},index = mul_index)
| | Score |
---|
Upper | Lower | |
---|
A | a | perfect |
---|
b | good |
---|
B | a | fair |
---|
b | bad |
---|
L1 = list('AABB')
L2 = list('abab')
tuples = list(zip(L1,L2))
mul_index = pd.MultiIndex.from_tuples(tuples,names = ('Upper','Lower'))
pd.DataFrame({'Score':['perfect','good','fair','bad']},index=mul_index)
| | Score |
---|
Upper | Lower | |
---|
A | a | perfect |
---|
b | good |
---|
B | a | fair |
---|
b | bad |
---|
arrays = [['A','a'],['A','b'],['B','a'],['B','b']]
mul_index = pd.MultiIndex.from_tuples(arrays, names=('Upper', 'Lower'))
pd.DataFrame({'Score':['perfect','good','fair','bad']},index=mul_index)
| | Score |
---|
Upper | Lower | |
---|
A | a | perfect |
---|
b | good |
---|
B | a | fair |
---|
b | bad |
---|
mul_index
MultiIndex([('A', 'a'),
('A', 'b'),
('B', 'a'),
('B', 'b')],
names=['Upper', 'Lower'])
L1 = ['A','B']
L2 = ['a','b']
pd.MultiIndex.from_product([L1,L2],names = ('Upper','Lower'))
MultiIndex([('A', 'a'),
('A', 'b'),
('B', 'a'),
('B', 'b')],
names=['Upper', 'Lower'])
df_using_mul = df.set_index(['Class','Address'])
df_using_mul.head()
| | School | Gender | Height | Weight | Math | Physics |
---|
Class | Address | | | | | | |
---|
C_1 | street_1 | S_1 | M | 173 | 63 | 34.0 | A+ |
---|
street_2 | S_1 | F | 192 | 73 | 32.5 | B+ |
---|
street_2 | S_1 | M | 186 | 82 | 87.2 | B+ |
---|
street_2 | S_1 | F | 167 | 81 | 80.4 | B- |
---|
street_4 | S_1 | F | 159 | 64 | 84.8 | B+ |
---|
df_using_mul.head()
| | School | Gender | Height | Weight | Math | Physics |
---|
Class | Address | | | | | | |
---|
C_1 | street_1 | S_1 | M | 173 | 63 | 34.0 | A+ |
---|
street_2 | S_1 | F | 192 | 73 | 32.5 | B+ |
---|
street_2 | S_1 | M | 186 | 82 | 87.2 | B+ |
---|
street_2 | S_1 | F | 167 | 81 | 80.4 | B- |
---|
street_4 | S_1 | F | 159 | 64 | 84.8 | B+ |
---|
df_using_mul.sort_index().loc['C_2','street_5']
| | School | Gender | Height | Weight | Math | Physics |
---|
Class | Address | | | | | | |
---|
C_2 | street_5 | S_1 | M | 188 | 68 | 97.0 | A- |
---|
street_5 | S_1 | F | 162 | 63 | 33.8 | B |
---|
street_5 | S_2 | M | 193 | 100 | 39.1 | B |
---|
df_using_mul.sort_index().loc[('C_2','street_6'):('C_3','street_4')]
| | School | Gender | Height | Weight | Math | Physics |
---|
Class | Address | | | | | | |
---|
C_2 | street_6 | S_1 | M | 160 | 53 | 58.8 | A+ |
---|
street_6 | S_1 | F | 167 | 63 | 68.4 | B- |
---|
street_7 | S_2 | F | 194 | 77 | 68.5 | B+ |
---|
street_7 | S_2 | F | 183 | 76 | 85.4 | B |
---|
C_3 | street_1 | S_1 | F | 175 | 57 | 87.7 | A- |
---|
street_2 | S_1 | M | 195 | 70 | 85.2 | A |
---|
street_4 | S_1 | M | 161 | 68 | 31.5 | B+ |
---|
street_4 | S_2 | F | 157 | 78 | 72.3 | B+ |
---|
street_4 | S_2 | M | 187 | 73 | 48.9 | B |
---|
df_using_mul.sort_index().loc[('C_2','street_7'):'C_3'].head()
| | School | Gender | Height | Weight | Math | Physics |
---|
Class | Address | | | | | | |
---|
C_2 | street_7 | S_2 | F | 194 | 77 | 68.5 | B+ |
---|
street_7 | S_2 | F | 183 | 76 | 85.4 | B |
---|
C_3 | street_1 | S_1 | F | 175 | 57 | 87.7 | A- |
---|
street_2 | S_1 | M | 195 | 70 | 85.2 | A |
---|
street_4 | S_1 | M | 161 | 68 | 31.5 | B+ |
---|
df_using_mul.sort_index().loc[[('C_2','street_7'),('C_3','street_6')]]
| | School | Gender | Height | Weight | Math | Physics |
---|
Class | Address | | | | | | |
---|
C_2 | street_7 | S_2 | F | 194 | 77 | 68.5 | B+ |
---|
street_7 | S_2 | F | 183 | 76 | 85.4 | B |
---|
C_3 | street_6 | S_2 | F | 164 | 81 | 95.5 | A- |
---|
df_using_mul.sort_index().loc[(['C_2','C_3'],['street_4','street_7']),:]
| | School | Gender | Height | Weight | Math | Physics |
---|
Class | Address | | | | | | |
---|
C_2 | street_4 | S_1 | F | 176 | 94 | 63.5 | B- |
---|
street_4 | S_2 | M | 155 | 91 | 73.8 | A+ |
---|
street_7 | S_2 | F | 194 | 77 | 68.5 | B+ |
---|
street_7 | S_2 | F | 183 | 76 | 85.4 | B |
---|
C_3 | street_4 | S_1 | M | 161 | 68 | 31.5 | B+ |
---|
street_4 | S_2 | F | 157 | 78 | 72.3 | B+ |
---|
street_4 | S_2 | M | 187 | 73 | 48.9 | B |
---|
street_7 | S_1 | M | 188 | 82 | 49.7 | B |
---|
street_7 | S_2 | F | 190 | 99 | 65.9 | C |
---|
3. 多层索引中的slice对象
L1,L2 = ['A','B','C'],['a','b','c']
mul_index1 = pd.MultiIndex.from_product([L1,L2],names=('Upper','Lower'))
L3,L4 = ['D','E','F'],['d','e','f']
mul_index2 = pd.MultiIndex.from_product([L3,L4],names = ('Big', 'Small'))
df_s = pd.DataFrame(np.random.rand(9,9),index=mul_index1,columns=mul_index2)
df_s
| Big | D | E | F |
---|
| Small | d | e | f | d | e | f | d | e | f |
---|
Upper | Lower | | | | | | | | | |
---|
A | a | 0.164846 | 0.241633 | 0.034336 | 0.796122 | 0.471722 | 0.570825 | 0.623144 | 0.983168 | 0.375642 |
---|
b | 0.810188 | 0.300858 | 0.163051 | 0.317953 | 0.506154 | 0.276052 | 0.786276 | 0.560413 | 0.134370 |
---|
c | 0.010443 | 0.491651 | 0.691116 | 0.827531 | 0.198097 | 0.516177 | 0.970233 | 0.959832 | 0.146551 |
---|
B | a | 0.737642 | 0.618927 | 0.299963 | 0.429047 | 0.105024 | 0.349213 | 0.733588 | 0.947679 | 0.398828 |
---|
b | 0.641342 | 0.442321 | 0.326563 | 0.702557 | 0.736336 | 0.296721 | 0.228918 | 0.547085 | 0.313883 |
---|
c | 0.718083 | 0.563424 | 0.324585 | 0.630863 | 0.731556 | 0.576871 | 0.255530 | 0.478772 | 0.909172 |
---|
C | a | 0.791173 | 0.920724 | 0.548437 | 0.926331 | 0.845007 | 0.194106 | 0.524524 | 0.393335 | 0.974597 |
---|
b | 0.410258 | 0.198171 | 0.938370 | 0.185703 | 0.846197 | 0.168732 | 0.274920 | 0.129650 | 0.758688 |
---|
c | 0.458752 | 0.960005 | 0.069176 | 0.197260 | 0.365588 | 0.203271 | 0.492156 | 0.547211 | 0.506749 |
---|
疑问,indexslice用法?
idx = pd.IndexSlice
df_s.loc[idx['B':,df_s['D']['d']>0.3],idx[df_s.sum()>4]]
| Big | D | E | F |
---|
| Small | d | e | d | e | d | e | f |
---|
Upper | Lower | | | | | | | |
---|
B | a | 0.737642 | 0.618927 | 0.429047 | 0.105024 | 0.733588 | 0.947679 | 0.398828 |
---|
b | 0.641342 | 0.442321 | 0.702557 | 0.736336 | 0.228918 | 0.547085 | 0.313883 |
---|
c | 0.718083 | 0.563424 | 0.630863 | 0.731556 | 0.255530 | 0.478772 | 0.909172 |
---|
C | a | 0.791173 | 0.920724 | 0.926331 | 0.845007 | 0.524524 | 0.393335 | 0.974597 |
---|
b | 0.410258 | 0.198171 | 0.185703 | 0.846197 | 0.274920 | 0.129650 | 0.758688 |
---|
c | 0.458752 | 0.960005 | 0.197260 | 0.365588 | 0.492156 | 0.547211 | 0.506749 |
---|
4. 索引层的交换
df_using_mul.head()
| | School | Gender | Height | Weight | Math | Physics |
---|
Class | Address | | | | | | |
---|
C_1 | street_1 | S_1 | M | 173 | 63 | 34.0 | A+ |
---|
street_2 | S_1 | F | 192 | 73 | 32.5 | B+ |
---|
street_2 | S_1 | M | 186 | 82 | 87.2 | B+ |
---|
street_2 | S_1 | F | 167 | 81 | 80.4 | B- |
---|
street_4 | S_1 | F | 159 | 64 | 84.8 | B+ |
---|
df_using_mul.swaplevel(i=1,j=0,axis = 0).sort_index().head()
| | School | Gender | Height | Weight | Math | Physics |
---|
Address | Class | | | | | | |
---|
street_1 | C_1 | S_1 | M | 173 | 63 | 34.0 | A+ |
---|
C_2 | S_2 | M | 175 | 74 | 47.2 | B- |
---|
C_3 | S_1 | F | 175 | 57 | 87.7 | A- |
---|
street_2 | C_1 | S_1 | F | 192 | 73 | 32.5 | B+ |
---|
C_1 | S_1 | M | 186 | 82 | 87.2 | B+ |
---|
df_muls = df.set_index(['School','Class','Address'])
df_muls.head()
| | | Gender | Height | Weight | Math | Physics |
---|
School | Class | Address | | | | | |
---|
S_1 | C_1 | street_1 | M | 173 | 63 | 34.0 | A+ |
---|
street_2 | F | 192 | 73 | 32.5 | B+ |
---|
street_2 | M | 186 | 82 | 87.2 | B+ |
---|
street_2 | F | 167 | 81 | 80.4 | B- |
---|
street_4 | F | 159 | 64 | 84.8 | B+ |
---|
df_muls.reorder_levels([2,0,1],axis = 0).sort_index().head()
| | | Gender | Height | Weight | Math | Physics |
---|
Address | School | Class | | | | | |
---|
street_1 | S_1 | C_1 | M | 173 | 63 | 34.0 | A+ |
---|
C_3 | F | 175 | 57 | 87.7 | A- |
---|
S_2 | C_2 | M | 175 | 74 | 47.2 | B- |
---|
street_2 | S_1 | C_1 | F | 192 | 73 | 32.5 | B+ |
---|
C_1 | M | 186 | 82 | 87.2 | B+ |
---|
df_muls.reorder_levels(['Address','School','Class'],axis = 0).sort_index().head()
| | | Gender | Height | Weight | Math | Physics |
---|
Address | School | Class | | | | | |
---|
street_1 | S_1 | C_1 | M | 173 | 63 | 34.0 | A+ |
---|
C_3 | F | 175 | 57 | 87.7 | A- |
---|
S_2 | C_2 | M | 175 | 74 | 47.2 | B- |
---|
street_2 | S_1 | C_1 | F | 192 | 73 | 32.5 | B+ |
---|
C_1 | M | 186 | 82 | 87.2 | B+ |
---|
三. 索引设定
3.1. index_loc参数设定
pd.read_csv('C:/Users/admin/Desktop/joyful-pandas-master/joyful-pandas-master/data/table.csv',index_col = ['Address','School']).head()
| | Class | ID | Gender | Height | Weight | Math | Physics |
---|
Address | School | | | | | | | |
---|
street_1 | S_1 | C_1 | 1101 | M | 173 | 63 | 34.0 | A+ |
---|
street_2 | S_1 | C_1 | 1102 | F | 192 | 73 | 32.5 | B+ |
---|
S_1 | C_1 | 1103 | M | 186 | 82 | 87.2 | B+ |
---|
S_1 | C_1 | 1104 | F | 167 | 81 | 80.4 | B- |
---|
street_4 | S_1 | C_1 | 1105 | F | 159 | 64 | 84.8 | B+ |
---|
df.head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
1102 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
---|
1103 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
---|
1104 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
---|
1105 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
---|
df.reindex(index = [1101,1203,1206,2402])
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | S_1 | C_1 | M | street_1 | 173.0 | 63.0 | 34.0 | A+ |
---|
1203 | S_1 | C_2 | M | street_6 | 160.0 | 53.0 | 58.8 | A+ |
---|
1206 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
---|
2402 | S_2 | C_4 | M | street_7 | 166.0 | 82.0 | 48.7 | B |
---|
df.reindex(columns=['Height','Gender','Average']).head()
| Height | Gender | Average |
---|
ID | | | |
---|
1101 | 173 | M | NaN |
---|
1102 | 192 | F | NaN |
---|
1103 | 186 | M | NaN |
---|
1104 | 167 | F | NaN |
---|
1105 | 159 | F | NaN |
---|
df.reindex(index = [1101,1203,1206,2402],method = 'bfill')
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
1203 | S_1 | C_2 | M | street_6 | 160 | 53 | 58.8 | A+ |
---|
1206 | S_1 | C_3 | M | street_4 | 161 | 68 | 31.5 | B+ |
---|
2402 | S_2 | C_4 | M | street_7 | 166 | 82 | 48.7 | B |
---|
df.reindex(index = [1101,1203,1206,2402],method = 'nearest')
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
1203 | S_1 | C_2 | M | street_6 | 160 | 53 | 58.8 | A+ |
---|
1206 | S_1 | C_2 | F | street_6 | 167 | 63 | 68.4 | B- |
---|
2402 | S_2 | C_4 | M | street_7 | 166 | 82 | 48.7 | B |
---|
df_temp = pd.DataFrame({'Weight':np.zeros(5),
'Height':np.zeros(5),
'ID':[1101,1104,1103,1106,1102]}).set_index('ID')
df_temp.reindex_like(df[0:5][['Weight','Height']])
| Weight | Height |
---|
ID | | |
---|
1101 | 0.0 | 0.0 |
---|
1102 | 0.0 | 0.0 |
---|
1103 | 0.0 | 0.0 |
---|
1104 | 0.0 | 0.0 |
---|
1105 | NaN | NaN |
---|
df_temp1 = pd.DataFrame({'Weight':range(5),
'Height':range(5),
'ID':[1101,1104,1103,1106,1102]}).set_index('ID').sort_index()
df_temp1.reindex_like(df[0:5][['Weight','Height']],method='bfill')
| Weight | Height |
---|
ID | | |
---|
1101 | 0 | 0 |
---|
1102 | 4 | 4 |
---|
1103 | 2 | 2 |
---|
1104 | 1 | 1 |
---|
1105 | 3 | 3 |
---|
df.head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
1102 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
---|
1103 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
---|
1104 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
---|
1105 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
---|
df.set_index('Class').head()
| School | Gender | Address | Height | Weight | Math | Physics |
---|
Class | | | | | | | |
---|
C_1 | S_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
C_1 | S_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
---|
C_1 | S_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
---|
C_1 | S_1 | F | street_2 | 167 | 81 | 80.4 | B- |
---|
C_1 | S_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
---|
df.set_index('Class',append= True).head()
| | School | Gender | Address | Height | Weight | Math | Physics |
---|
ID | Class | | | | | | | |
---|
1101 | C_1 | S_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
1102 | C_1 | S_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
---|
1103 | C_1 | S_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
---|
1104 | C_1 | S_1 | F | street_2 | 167 | 81 | 80.4 | B- |
---|
1105 | C_1 | S_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
---|
df.set_index(pd.Series(range(df.shape[0]))).head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
0 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
1 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
---|
2 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
---|
3 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
---|
4 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
---|
df.set_index([pd.Series(range(df.shape[0])),pd.Series(np.ones(df.shape[0]))]).head()
| | School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
0 | 1.0 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
1 | 1.0 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
---|
2 | 1.0 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
---|
3 | 1.0 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
---|
4 | 1.0 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
---|
df.reset_index().head()
| ID | School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
0 | 1101 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
1 | 1102 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
---|
2 | 1103 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
---|
3 | 1104 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
---|
4 | 1105 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
---|
L1,L2 = ['A','B','C'],['a','b','c']
mul_index1 = pd.MultiIndex.from_product([L1,L2],names=('Upper', 'Lower'))
L3,L4 = ['D','E','F'],['d','e','f']
mul_index2 = pd.MultiIndex.from_product([L3,L4],names=('Big', 'Small'))
df_temp = pd.DataFrame(np.random.rand(9,9),index=mul_index1,columns=mul_index2)
df_temp.head()
| Big | D | E | F |
---|
| Small | d | e | f | d | e | f | d | e | f |
---|
Upper | Lower | | | | | | | | | |
---|
A | a | 0.277061 | 0.229405 | 0.733455 | 0.273164 | 0.121625 | 0.647950 | 0.959101 | 0.533145 | 0.485486 |
---|
b | 0.806594 | 0.844824 | 0.343062 | 0.449254 | 0.505103 | 0.668366 | 0.476697 | 0.099637 | 0.130089 |
---|
c | 0.021276 | 0.037226 | 0.516542 | 0.878164 | 0.845835 | 0.315555 | 0.562546 | 0.957017 | 0.571085 |
---|
B | a | 0.869212 | 0.786385 | 0.856268 | 0.792681 | 0.425281 | 0.755875 | 0.453019 | 0.342979 | 0.610428 |
---|
b | 0.663581 | 0.509840 | 0.437187 | 0.876060 | 0.583025 | 0.527439 | 0.491403 | 0.894846 | 0.520632 |
---|
用level参数指定哪一层被reset,用col_level参数指定set到哪一层
df_temp1 = df_temp.reset_index(level = 1,col_level = 1)
df_temp1.head()
Big | | D | E | F |
---|
Small | Lower | d | e | f | d | e | f | d | e | f |
---|
Upper | | | | | | | | | | |
---|
A | a | 0.277061 | 0.229405 | 0.733455 | 0.273164 | 0.121625 | 0.647950 | 0.959101 | 0.533145 | 0.485486 |
---|
A | b | 0.806594 | 0.844824 | 0.343062 | 0.449254 | 0.505103 | 0.668366 | 0.476697 | 0.099637 | 0.130089 |
---|
A | c | 0.021276 | 0.037226 | 0.516542 | 0.878164 | 0.845835 | 0.315555 | 0.562546 | 0.957017 | 0.571085 |
---|
B | a | 0.869212 | 0.786385 | 0.856268 | 0.792681 | 0.425281 | 0.755875 | 0.453019 | 0.342979 | 0.610428 |
---|
B | b | 0.663581 | 0.509840 | 0.437187 | 0.876060 | 0.583025 | 0.527439 | 0.491403 | 0.894846 | 0.520632 |
---|
df_temp1.columns
MultiIndex([( '', 'Lower'),
('D', 'd'),
('D', 'e'),
('D', 'f'),
('E', 'd'),
('E', 'e'),
('E', 'f'),
('F', 'd'),
('F', 'e'),
('F', 'f')],
names=['Big', 'Small'])
df_temp1.index
Index(['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'], dtype='object', name='Upper')
df_temp.rename_axis(index = {'Lower':'Lower1'},columns={'Big':'Big1'})
| Big1 | D | E | F |
---|
| Small | d | e | f | d | e | f | d | e | f |
---|
Upper | Lower1 | | | | | | | | | |
---|
A | a | 0.277061 | 0.229405 | 0.733455 | 0.273164 | 0.121625 | 0.647950 | 0.959101 | 0.533145 | 0.485486 |
---|
b | 0.806594 | 0.844824 | 0.343062 | 0.449254 | 0.505103 | 0.668366 | 0.476697 | 0.099637 | 0.130089 |
---|
c | 0.021276 | 0.037226 | 0.516542 | 0.878164 | 0.845835 | 0.315555 | 0.562546 | 0.957017 | 0.571085 |
---|
B | a | 0.869212 | 0.786385 | 0.856268 | 0.792681 | 0.425281 | 0.755875 | 0.453019 | 0.342979 | 0.610428 |
---|
b | 0.663581 | 0.509840 | 0.437187 | 0.876060 | 0.583025 | 0.527439 | 0.491403 | 0.894846 | 0.520632 |
---|
c | 0.397011 | 0.203262 | 0.091956 | 0.976940 | 0.565409 | 0.710670 | 0.990769 | 0.054233 | 0.120749 |
---|
C | a | 0.080383 | 0.706350 | 0.456951 | 0.191387 | 0.028303 | 0.002020 | 0.123343 | 0.940375 | 0.189775 |
---|
b | 0.702715 | 0.394502 | 0.157176 | 0.568932 | 0.079573 | 0.135915 | 0.681615 | 0.209627 | 0.908920 |
---|
c | 0.110645 | 0.875330 | 0.989039 | 0.966466 | 0.249008 | 0.192039 | 0.474032 | 0.068912 | 0.130035 |
---|
df_temp.rename(index={'A':'T'},columns={'e':'e+'}).head()
| Big | D | E | F |
---|
| Small | d | e+ | f | d | e+ | f | d | e+ | f |
---|
Upper | Lower | | | | | | | | | |
---|
T | a | 0.277061 | 0.229405 | 0.733455 | 0.273164 | 0.121625 | 0.647950 | 0.959101 | 0.533145 | 0.485486 |
---|
b | 0.806594 | 0.844824 | 0.343062 | 0.449254 | 0.505103 | 0.668366 | 0.476697 | 0.099637 | 0.130089 |
---|
c | 0.021276 | 0.037226 | 0.516542 | 0.878164 | 0.845835 | 0.315555 | 0.562546 | 0.957017 | 0.571085 |
---|
B | a | 0.869212 | 0.786385 | 0.856268 | 0.792681 | 0.425281 | 0.755875 | 0.453019 | 0.342979 | 0.610428 |
---|
b | 0.663581 | 0.509840 | 0.437187 | 0.876060 | 0.583025 | 0.527439 | 0.491403 | 0.894846 | 0.520632 |
---|
四、常用索引型函数
df.head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
1102 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
---|
1103 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
---|
1104 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
---|
1105 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
---|
df.where(df['Gender']=='M').head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | S_1 | C_1 | M | street_1 | 173.0 | 63.0 | 34.0 | A+ |
---|
1102 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
---|
1103 | S_1 | C_1 | M | street_2 | 186.0 | 82.0 | 87.2 | B+ |
---|
1104 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
---|
1105 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
---|
df.where(df['Gender']=='M').dropna().head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | S_1 | C_1 | M | street_1 | 173.0 | 63.0 | 34.0 | A+ |
---|
1103 | S_1 | C_1 | M | street_2 | 186.0 | 82.0 | 87.2 | B+ |
---|
1201 | S_1 | C_2 | M | street_5 | 188.0 | 68.0 | 97.0 | A- |
---|
1203 | S_1 | C_2 | M | street_6 | 160.0 | 53.0 | 58.8 | A+ |
---|
1301 | S_1 | C_3 | M | street_4 | 161.0 | 68.0 | 31.5 | B+ |
---|
df.where(df["Gender"] == 'M',np.random.rand(df.shape[0],df.shape[1])).head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | S_1 | C_1 | M | street_1 | 173.000000 | 63.000000 | 34.000000 | A+ |
---|
1102 | 0.370522 | 0.820446 | 0.923173 | 0.973714 | 0.892504 | 0.343564 | 0.104043 | 0.360941 |
---|
1103 | S_1 | C_1 | M | street_2 | 186.000000 | 82.000000 | 87.200000 | B+ |
---|
1104 | 0.139049 | 0.764316 | 0.932479 | 0.115684 | 0.424560 | 0.076695 | 0.879297 | 0.0701079 |
---|
1105 | 0.345771 | 0.623447 | 0.245876 | 0.47349 | 0.454328 | 0.821899 | 0.205652 | 0.20036 |
---|
df.mask(df['Gender']=='M').dropna().head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1102 | S_1 | C_1 | F | street_2 | 192.0 | 73.0 | 32.5 | B+ |
---|
1104 | S_1 | C_1 | F | street_2 | 167.0 | 81.0 | 80.4 | B- |
---|
1105 | S_1 | C_1 | F | street_4 | 159.0 | 64.0 | 84.8 | B+ |
---|
1202 | S_1 | C_2 | F | street_4 | 176.0 | 94.0 | 63.5 | B- |
---|
1204 | S_1 | C_2 | F | street_5 | 162.0 | 63.0 | 33.8 | B |
---|
df.mask(df["Gender"] == 'M',np.random.rand(df.shape[0],df.shape[1])).head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | 0.513481 | 0.421454 | 0.294483 | 0.215076 | 0.189442 | 0.294228 | 0.071693 | 0.287266 |
---|
1102 | S_1 | C_1 | F | street_2 | 192.000000 | 73.000000 | 32.500000 | B+ |
---|
1103 | 0.0712371 | 0.0409435 | 0.888475 | 0.197385 | 0.792995 | 0.350776 | 0.454137 | 0.749055 |
---|
1104 | S_1 | C_1 | F | street_2 | 167.000000 | 81.000000 | 80.400000 | B- |
---|
1105 | S_1 | C_1 | F | street_4 | 159.000000 | 64.000000 | 84.800000 | B+ |
---|
df.head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
1102 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
---|
1103 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
---|
1104 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
---|
1105 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
---|
df.query('Address in ["street_6","street_7"] & (Weight > (70+10)) & (ID in [1303,2304,2402])')
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1303 | S_1 | C_3 | M | street_7 | 188 | 82 | 49.7 | B |
---|
2304 | S_2 | C_3 | F | street_6 | 164 | 81 | 95.5 | A- |
---|
2402 | S_2 | C_4 | M | street_7 | 166 | 82 | 48.7 | B |
---|
五、重复元素处理
5.1、duplicated方法
该方法返回了是否重复的布尔列表
df.duplicated('Class').head()
ID
1101 False
1102 True
1103 True
1104 True
1105 True
dtype: bool
可选参数keep默认为first,即首次出现设为不重复,若为last,则最后一次设为不重复,若为False,则左右重复项为False.
df.duplicated('Class',keep='last').tail()
ID
2401 True
2402 True
2403 True
2404 True
2405 False
dtype: bool
df.duplicated('Class',keep = False).head()
ID
1101 True
1102 True
1103 True
1104 True
1105 True
dtype: bool
5.2、drop_duplicated方法
从名字上看即为剔除重复项
df.drop_duplicates('Class')
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
1201 | S_1 | C_2 | M | street_5 | 188 | 68 | 97.0 | A- |
---|
1301 | S_1 | C_3 | M | street_4 | 161 | 68 | 31.5 | B+ |
---|
2401 | S_2 | C_4 | F | street_2 | 192 | 62 | 45.3 | A |
---|
df.drop_duplicates('Class',keep='last')
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
2105 | S_2 | C_1 | M | street_4 | 170 | 81 | 34.2 | A |
---|
2205 | S_2 | C_2 | F | street_7 | 183 | 76 | 85.4 | B |
---|
2305 | S_2 | C_3 | M | street_4 | 187 | 73 | 48.9 | B |
---|
2405 | S_2 | C_4 | F | street_6 | 193 | 54 | 47.6 | B |
---|
- 在传入多列时等价于将多列共同视作一个多级索引,比较重复项:
df.drop_duplicates(['School','Class'])
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1101 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
1201 | S_1 | C_2 | M | street_5 | 188 | 68 | 97.0 | A- |
---|
1301 | S_1 | C_3 | M | street_4 | 161 | 68 | 31.5 | B+ |
---|
2101 | S_2 | C_1 | M | street_7 | 174 | 84 | 83.3 | C |
---|
2201 | S_2 | C_2 | M | street_5 | 193 | 100 | 39.1 | B |
---|
2301 | S_2 | C_3 | F | street_4 | 157 | 78 | 72.3 | B+ |
---|
2401 | S_2 | C_4 | F | street_2 | 192 | 62 | 45.3 | A |
---|
六、抽样函数
6.1、n为样本量
df.sample(n = 5)
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
2201 | S_2 | C_2 | M | street_5 | 193 | 100 | 39.1 | B |
---|
2304 | S_2 | C_3 | F | street_6 | 164 | 81 | 95.5 | A- |
---|
1205 | S_1 | C_2 | F | street_6 | 167 | 63 | 68.4 | B- |
---|
1304 | S_1 | C_3 | M | street_2 | 195 | 70 | 85.2 | A |
---|
1101 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
---|
6.2、frac为抽样比
df.sample(frac = 0.05)
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
2403 | S_2 | C_4 | F | street_6 | 158 | 60 | 59.7 | B+ |
---|
1201 | S_1 | C_2 | M | street_5 | 188 | 68 | 97.0 | A- |
---|
6.3、replace为是否放回
df.sample(n =df.shape[0],replace = True).head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
2101 | S_2 | C_1 | M | street_7 | 174 | 84 | 83.3 | C |
---|
1104 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
---|
2105 | S_2 | C_1 | M | street_4 | 170 | 81 | 34.2 | A |
---|
1103 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
---|
1102 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
---|
df.sample(n=35,replace=True).index.is_unique
False
6.4、axis为抽样维度,默认为0,即抽行
df.sample(n=3,axis=1).head()
| Class | Weight | Height |
---|
ID | | | |
---|
1101 | C_1 | 63 | 173 |
---|
1102 | C_1 | 73 | 192 |
---|
1103 | C_1 | 82 | 186 |
---|
1104 | C_1 | 81 | 167 |
---|
1105 | C_1 | 64 | 159 |
---|
6.5、weights为样本权重,自动归一化
df.sample(n=3,weights=np.random.rand(df.shape[0])).head()
| School | Class | Gender | Address | Height | Weight | Math | Physics |
---|
ID | | | | | | | | |
---|
1201 | S_1 | C_2 | M | street_5 | 188 | 68 | 97.0 | A- |
---|
1303 | S_1 | C_3 | M | street_7 | 188 | 82 | 49.7 | B |
---|
2105 | S_2 | C_1 | M | street_4 | 170 | 81 | 34.2 | A |
---|