CH04 - Pandas 【变形】

import numpy as np
import pandas as pd
df = pd.read_csv('C:/Users/admin/Desktop/joyful-pandas-master/joyful-pandas-master/data/table.csv')
df.head()
SchoolClassIDGenderAddressHeightWeightMathPhysics
0S_1C_11101Mstreet_11736334.0A+
1S_1C_11102Fstreet_21927332.5B+
2S_1C_11103Mstreet_21868287.2B+
3S_1C_11104Fstreet_21678180.4B-
4S_1C_11105Fstreet_41596484.8B+

1、透视表

1.1、pivot

df.pivot(index='ID',columns='Gender',values='Height').head()
GenderFM
ID
1101NaN173.0
1102192.0NaN
1103NaN186.0
1104167.0NaN
1105159.0NaN

1.2、pivot_table

pd.pivot_table(df,index='ID',columns='Gender',values='Height').head()
GenderFM
ID
1101NaN173.0
1102192.0NaN
1103NaN186.0
1104167.0NaN
1105159.0NaN
pd.pivot_table(df,index='ID',columns='Gender',values='Height',aggfunc=['mean','sum']).head()
meansum
GenderFMFM
ID
1101NaN173.0NaN173.0
1102192.0NaN192.0NaN
1103NaN186.0NaN186.0
1104167.0NaN167.0NaN
1105159.0NaN159.0NaN
pd.pivot_table(df,index='School',columns='Gender',values='Height',aggfunc=['mean','sum'],margins = True).head()
meansum
GenderFMAllFMAll
School
S_1173.125000178.714286175.733333138512512636
S_2173.727273172.000000172.950000191115483459
All173.473684174.937500174.142857329627996095
pd.pivot_table(df,index = ['School','Class'],columns = ['Gender','Address'],values = ['Height','Weight'])
Height...Weight
GenderFM...FM
Addressstreet_1street_2street_4street_5street_6street_7street_1street_2street_4street_5...street_4street_5street_6street_7street_1street_2street_4street_5street_6street_7
SchoolClass
S_1C_1NaN179.5159.0NaNNaNNaN173.0186.0NaNNaN...64.0NaNNaNNaN63.082.0NaNNaNNaNNaN
C_2NaNNaN176.0162.0167.0NaNNaNNaNNaN188.0...94.063.063.0NaNNaNNaNNaN68.053.0NaN
C_3175.0NaNNaN187.0NaNNaNNaN195.0161.0NaN...NaN69.0NaNNaNNaN70.068.0NaNNaN82.0
S_2C_1NaNNaNNaN159.0161.0NaNNaNNaN163.5NaN...NaN97.061.0NaNNaNNaN71.0NaNNaN84.0
C_2NaNNaNNaNNaNNaN188.5175.0NaN155.0193.0...NaNNaNNaN76.574.0NaN91.0100.0NaNNaN
C_3NaNNaN157.0NaN164.0190.0NaNNaN187.0171.0...78.0NaN81.099.0NaNNaN73.088.0NaNNaN
C_4NaN176.0NaNNaN175.5NaNNaNNaNNaNNaN...NaNNaN57.0NaNNaNNaNNaNNaNNaN82.0

7 rows × 24 columns

1.3、crosstab 交叉表

pd.crosstab(index = df['Address'],columns=df['Gender'])
GenderFM
Address
street_112
street_242
street_435
street_533
street_651
street_733
pd.crosstab(index=df['Address'],columns=df['Gender'],values=np.random.randint(1,20,df.shape[0]),aggfunc='min')
GenderFM
Address
street_185
street_21011
street_4171
street_535
street_699
street_772
pd.crosstab(index = df['Address'],columns = df['Gender'],normalize='all',margins=True)
GenderFMAll
Address
street_10.0285710.0571430.085714
street_20.1142860.0571430.171429
street_40.0857140.1428570.228571
street_50.0857140.0857140.171429
street_60.1428570.0285710.171429
street_70.0857140.0857140.171429
All0.5428570.4571431.000000

2、其他变形方法

2.1、melt方法

df_m = df[['ID','Gender','Math']]
df_m.head()
IDGenderMath
01101M34.0
11102F32.5
21103M87.2
31104F80.4
41105F84.8
pivoted = df.pivot(index='ID',columns='Gender',values='Math')
result = pivoted.reset_index().melt(id_vars=['ID'],value_vars=['F','M'],value_name='Math')\
                     .dropna().set_index('ID').sort_index()
result.head()
GenderMath
ID
1101M34.0
1102F32.5
1103M87.2
1104F80.4
1105F84.8

2.2、压缩与展开

  • stack()即“堆叠”,作用是将列旋转到行;
  • unstack()即stack()的反操作,将行旋转到列;
  • stack和unstack默认操作为最内层
  • stack和unstack默认旋转轴的级别将会操作最低级别(最内层)
df_s = pd.pivot_table(df,index=['Class','ID'],columns='Gender',values=['Height','Weight'])
df_s.groupby('Class').head(2)
HeightWeight
GenderFMFM
ClassID
C_11101NaN173.0NaN63.0
1102192.0NaN73.0NaN
C_21201NaN188.0NaN68.0
1202176.0NaN94.0NaN
C_31301NaN161.0NaN68.0
1302175.0NaN57.0NaN
C_42401192.0NaN62.0NaN
2402NaN166.0NaN82.0
  • 由于stack和unstack默认是操作最里层的数据,当我们想操作外层的数据时,就要传入一个层级序号或名称来拆分一个不同的层级。
df_stacked = df_s.stack()
df_stacked.groupby('Class').head(2)
HeightWeight
ClassIDGender
C_11101M173.063.0
1102F192.073.0
C_21201M188.068.0
1202F176.094.0
C_31301M161.068.0
1302F175.057.0
C_42401F192.062.0
2402M166.082.0
df_stacked.head()
HeightWeight
ClassIDGender
C_11101M173.063.0
1102F192.073.0
1103M186.082.0
1104F167.081.0
1105F159.064.0
df_unstacked = df_stacked.unstack()
df_unstacked.groupby('Class').head(2)
HeightWeight
GenderFMFM
ClassID
C_11101NaN173.0NaN63.0
1102192.0NaN73.0NaN
C_21201NaN188.0NaN68.0
1202176.0NaN94.0NaN
C_31301NaN161.0NaN68.0
1302175.0NaN57.0NaN
C_42401192.0NaN62.0NaN
2402NaN166.0NaN82.0

3、哑变量与因子化

3.1、Dummy Variable(哑变量)

df_d = df[['Class','Gender','Weight']]  
df_d.head()
ClassGenderWeight
0C_1M63
1C_1F73
2C_1M82
3C_1F81
4C_1F64
pd.get_dummies(df_d[['Class','Gender']]).join(df_d['Weight']).head() #one-hot编码
Class_C_1Class_C_2Class_C_3Class_C_4Gender_FGender_MWeight
010000163
110001073
210000182
310001081
410001064

3.2、factorize方法

codes,uniques = pd.factorize(['a',None,'c','b','a'],sort=True)
display(codes)
display(uniques)
array([ 0, -1,  2,  1,  0], dtype=int64)



array(['a', 'b', 'c'], dtype=object)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值