apply函数
pandas提供了apply函数方便的处理Series与DataFrame;apply函数支持逐一处理数据集中的每个元素都会执行一次目标函数,把返回值存入结果集中。:
ary = np. array( [ '80公斤' , '83公斤' , '78公斤' , '74公斤' , '84公斤' ] )
s = pd. Series( ary)
def func ( x) :
return x[ : 2 ]
s. apply ( func)
def func ( x) :
x[ pd. isna( x) ] = x. mean( )
return x
ratings. apply ( func, axis= 1 )
排序
Pandas 有两种排序方式,它们分别是按标签与按实际值排序。
import numpy as np
d = { 'Name' : pd. Series( [ 'Tom' , 'James' , 'Ricky' , 'Vin' , 'Steve' , 'Minsu' , 'Jack' , 'Lee' , 'David' , 'Gasper' , 'Betina' , 'Andres' ] ) ,
'Age' : pd. Series( [ 25 , 26 , 25 , 23 , 30 , 29 , 23 , 34 , 40 , 30 , 51 , 46 ] ) ,
'Rating' : pd. Series( [ 4.23 , 3.24 , 3.98 , 2.56 , 3.20 , 4.6 , 3.8 , 3.78 , 2.98 , 4.80 , 4.10 , 3.65 ] ) }
unsorted_df = pd. DataFrame( d)
按标签(行)排序
使用sort_index()方法,通过传递axis参数和排序顺序,可以对DataFrame进行排序。 默认情况下,按照升序对行标签进行排序。
sorted_df= unsorted_df. sort_index( )
print ( sorted_df)
sorted_df = unsorted_df. sort_index( ascending= False )
print ( sorted_df)
按标签(列)排序
sorted_df= unsorted_df. sort_index( axis= 1 )
print ( sorted_df)
按某列值排序
像索引排序一样,sort_values()是按值排序的方法。它接受一个by参数,它将使用要与其排序值的DataFrame的列名称。
sorted_df = unsorted_df. sort_values( by= 'Age' )
print ( sorted_df)
sorted_df = unsorted_df. sort_values( by= [ 'Age' , 'Rating' ] , ascending= [ True , False ] )
print ( sorted_df)
数据合并
concat
concat函数是在pandas的方法,可以根据不同的轴合并数据集。
r = pd. concat( datas, axis= 0 , join= 'outer' , ignore_index= False ,
keys= [ 'x' , 'y' , 'z' ] )
纵向合并:
横向合并:
merge & join
panda具有全功能、高性能的内存连接操作,与SQL之类的关系数据库非常相似。与其他开源实现相比,这些方法的性能要好得多(在某些情况下要好一个数量级以上)
pandas提供了merge函数实现高效的内存链接操作:
pd. merge( left, right, how= 'inner' , on= None , left_on= None , right_on= None , left_index= False , right_index= False )
参数名称 说明 left 接收DataFrame或Series。表示要添加的新数据。无默认。 right 接收DataFrame或Series。表示要添加的新数据。无默认。。 how 接收inner,outer,left,right。表示数据的连接方式。默认为inner。 on 接收string或sequence。表示外键字段名。默认为None。 left_on 接收string或sequence。关联操作时左表中的关联字段名。 right_on 接收string或sequence。关联操作时右表中的关联字段名。 left_index 接收boolean。表示是否将left参数接收数据的index作为连接主键。默认为False。 right_index 接收boolean。表示是否将right参数接收数据的index作为连接主键。默认为False。 sort 接收boolean。表示是否根据连接键对合并后的数据进行排序。默认为False。 suffixes 接收接收tuple。表示用于追加到left和right参数接收数据重叠列名的尾缀默认为(’_x’, ‘_y’)。
合并两个DataFrame:
import pandas as pd
left = pd. DataFrame( {
'student_id' : [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 ] ,
'student_name' : [ 'Alex' , 'Amy' , 'Allen' , 'Alice' , 'Ayoung' , 'Billy' , 'Brian' , 'Bran' , 'Bryce' , 'Betty' , 'Emma' , 'Marry' , 'Allen' , 'Jean' , 'Rose' , 'David' , 'Tom' , 'Jack' , 'Daniel' , 'Andrew' ] ,
'class_id' : [ 1 , 1 , 1 , 2 , 2 , 2 , 3 , 3 , 3 , 4 , 1 , 1 , 1 , 2 , 2 , 2 , 3 , 3 , 3 , 2 ] ,
'gender' : [ 'M' , 'M' , 'F' , 'F' , 'M' , 'M' , 'F' , 'F' , 'M' , 'M' , 'F' , 'F' , 'M' , 'M' , 'F' , 'F' , 'M' , 'M' , 'F' , 'F' ] ,
'age' : [ 20 , 21 , 22 , 20 , 21 , 22 , 23 , 20 , 21 , 22 , 20 , 21 , 22 , 23 , 20 , 21 , 22 , 20 , 21 , 22 ] ,
'score' : [ 98 , 74 , 67 , 38 , 65 , 29 , 32 , 34 , 85 , 64 , 52 , 38 , 26 , 89 , 68 , 46 , 32 , 78 , 79 , 87 ] } )
right = pd. DataFrame(
{ 'class_id' : [ 1 , 2 , 3 , 5 ] ,
'class_name' : [ 'ClassA' , 'ClassB' , 'ClassC' , 'ClassE' ] } )
data = pd. merge( left, right)
print ( data)
其他合并方法同数据库相同:
合并方法 SQL等效 描述 leftLEFT OUTER JOIN使用左侧对象的键 rightRIGHT OUTER JOIN使用右侧对象的键 outerFULL OUTER JOIN使用键的联合 innerINNER JOIN使用键的交集
实验:
rs = pd. merge( left, right, on= 'subject_id' , how= 'right' )
print ( rs)
rs = pd. merge( left, right, on= 'subject_id' , how= 'outer' )
print ( rs)
rs = pd. merge( left, right, on= 'subject_id' , how= 'inner' )
print ( rs)
分组聚合
pandas提供了功能类似于数据库中group by语句的用于拆分数据组的方法pd.groupby();该方法提供的是分组聚合步骤中的拆分功能,能根据索引或字段对数据进行分组(Split) 进而针对得到的多组数据执行聚合操作(Apply),最终合并为最终结果(Combine)。
分组
groupby方法的参数及其说明:
DataFrame. groupby( by= None , axis= 0 , as_index= True , sort= True )
参数名称 说明 by 接收list,string,mapping或generator。用于确定进行分组的依据。无默认。 axis 接收int。表示操作的轴向,默认对行进行操作。默认为0。 as_index 接收boolearn。表示聚合后的聚合标签是否以DataFrame索引形式输出。默认为True。 sort 接收boolearn。表示是否对分组依据分组标签进行排序。默认为True。
用groupby方法分组后的结果并不能直接查看,而是被存在内存中,输出的是内存地址。实际上分组后的数据对象(Groupby对象)类似Series与DataFrame,是pandas提供的一种对象。
Groupby对象的常用方法:
方法 说明 groupObject.get_group(‘A’) 返回A组的详细数据 groupObject.size() 返回每一组的频数
grouped = data. groupby( by= [ 'class_id' , 'gender' ] )
grouped. get_group( ( 1 , 'M' ) )
grouped = data. groupby( by= [ 'class_id' , 'gender' ] )
grouped. get_group( ( 1 , 'M' ) )
聚合
聚合函数为每个组返回聚合值。当创建了分组(groupby )对象,就可以对每个分组的其他字段数据执行求和、求标准差等操作。
使用聚合函数agg进行组内计算:
grouped = data. groupby( by= 'class_id' )
grouped. agg( { 'score' : np. mean} )
对于某个字段希望只做求均值操作,而对另一个字段则希望只做求和操作,可以使用字典的方式,将两个字段名分别作为key:
grouped. agg( { 'age' : np. max , 'score' : np. mean} )
还可以这样:
result = grouped. agg(
{ 'age' : np. max , 'score' : [ np. mean, np. max ] } )
pandas支持的聚合函数有:
方法名称 说明 count 计算分组的数目,包括缺失值。 head 返回每组的前n个值。 max 返回每组最大值。 mean 返回每组的均值。 median 返回每组的中位数。 cumcount 对每个分组中组员的进行标记,0至n-1。 size 返回每组的大小。 min 返回每组最小值。 std 返回每组的标准差。 sum 返回每组的和。
透视表与交叉表
透视表
透视表(pivot table)是各种电子表格程序和其他数据分析软件中一种常见的数据汇总工具。它根据一个或多个键对数据进行分组聚合,并根据每个分组进行数据汇总 。
print ( data. pivot_table( index= [ 'class_id' , 'gender' ] ) )
print ( data. pivot_table( index= [ 'class_id' , 'gender' ] , values= [ 'score' ] ) )
print ( data. pivot_table( index= [ 'class_id' , 'gender' ] , values= [ 'score' ] , columns= [ 'age' ] ) )
print ( data. pivot_table( index= [ 'class_id' , 'gender' ] , values= [ 'score' ] , columns= [ 'age' ] , margins= True ) )
print ( data. pivot_table( index= [ 'class_id' , 'gender' ] , values= [ 'score' ] , columns= [ 'age' ] , margins= True , aggfunc= 'max' ) )
交叉表
交叉表(cross-tabulation, 简称crosstab)是一种用于计算分组频率的特殊透视表 :
print ( pd. crosstab( data. class_id, data. gender, margins= True ) )
项目:分析影响学生成绩的因素
资源文件下载
https://download.youkuaiyun.com/download/yegeli/12562286 在我的资源中下载:StudentsPerformance.csv
字段 说明 gender 性别 race/ethnicity 种族 parental level of education 父母教育水平 lunch 午餐 test preparation course 是否通过预科考试 math score 数学得分 reading score 阅读得分 writing score 写作得分
学生成绩影响因素分析
import numpy as np
import pandas as pd
data = pd. read_csv( 'StudentsPerformance.csv' )
data[ 'total score' ] = data. sum ( axis= 1 )
data. describe( include= [ 'number' , 'object' ] )
gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score total score count 1000 1000 1000 1000 1000 1000.00000 1000.000000 1000.000000 1000.000000 unique 2 5 6 2 2 NaN NaN NaN NaN top female group C some college standard none NaN NaN NaN NaN freq 518 319 226 645 642 NaN NaN NaN NaN mean NaN NaN NaN NaN NaN 66.08900 69.169000 68.054000 203.312000 std NaN NaN NaN NaN NaN 15.16308 14.600192 15.195657 42.771978 min NaN NaN NaN NaN NaN 0.00000 17.000000 10.000000 27.000000 25% NaN NaN NaN NaN NaN 57.00000 59.000000 57.750000 175.000000 50% NaN NaN NaN NaN NaN 66.00000 70.000000 69.000000 205.000000 75% NaN NaN NaN NaN NaN 77.00000 79.000000 79.000000 233.000000 max NaN NaN NaN NaN NaN 100.00000 100.000000 100.000000 300.000000
data. pivot_table( index= 'gender' )
math score reading score total score writing score gender female 63.633205 72.608108 208.708494 72.467181 male 68.728216 65.473029 197.512448 63.311203
总体来说,女生的成绩普遍比较好,但是男生更善于数学。
data. pivot_table( index= 'race/ethnicity' )
math score reading score total score writing score race/ethnicity group A 61.629213 64.674157 188.977528 62.674157 group B 63.452632 67.352632 196.405263 65.600000 group C 64.463950 69.103448 201.394984 67.827586 group D 67.362595 70.030534 207.538168 70.145038 group E 73.821429 73.028571 218.257143 71.407143
种族划分(优秀-及格): E - D - C - B - A
r = data. pivot_table( index= 'parental level of education' )
r. sort_values( by= 'total score' , ascending= False )
math score reading score total score writing score parental level of education master's degree 69.745763 75.372881 220.796610 75.677966 bachelor's degree 69.389831 73.000000 215.771186 73.381356 associate's degree 67.882883 70.927928 208.707207 69.896396 some college 67.128319 69.460177 205.429204 68.840708 some high school 63.497207 66.938547 195.324022 64.888268 high school 62.137755 64.704082 189.290816 62.448980
父母受教育水平越高,学习成绩越好。
r = data. pivot_table( index= 'lunch' )
r. sort_values( by= 'total score' , ascending= False )
math score reading score total score writing score lunch standard 70.034109 71.654264 212.511628 70.823256 free/reduced 58.921127 64.653521 186.597183 63.022535
r = data. pivot_table( index= 'test preparation course' )
r. sort_values( by= 'total score' , ascending= False )
math score reading score total score writing score test preparation course completed 69.695531 73.893855 218.008380 74.418994 none 64.077882 66.534268 195.116822 64.504673
r = data. pivot_table( index= [ 'gender' , 'test preparation course' ] )
r
math score reading score total score writing score gender test preparation course female completed 67.195652 77.375000 223.364130 78.793478 none 61.670659 69.982036 200.634731 68.982036 male completed 72.339080 70.212644 212.344828 69.793103 none 66.688312 62.795455 189.133117 59.649351
分析前100名与后100名同学的不同情况
r = data. sort_values( by= 'total score' , ascending= False )
top100 = r. head( 100 )
tail100 = r. tail( 100 )
r1 = pd. DataFrame( { 'top100' : top100[ 'gender' ] . value_counts( ) ,
'tail100' : tail100[ 'gender' ] . value_counts( ) } )
r1
tail100 top100 female 38 66 male 62 34
data[ 'parental level of education' ] . value_counts( )
some college 226
associate's degree 222
high school 196
some high school 179
bachelor's degree 118
master's degree 59
Name: parental level of education, dtype: int64
r2 = pd. DataFrame( { 'top100' : top100[ 'parental level of education' ] . value_counts( ) ,
'tail100' : tail100[ 'parental level of education' ] . value_counts( ) } )
r2
tail100 top100 associate's degree 17 29 bachelor's degree 8 20 high school 32 6 master's degree 1 15 some college 14 21 some high school 28 9
代码总结
import numpy as np
import pandas as pd
ary = np. array( [ '80公斤' , '83公斤' , '78公斤' , '74公斤' , '84公斤' ] )
s = pd. Series( ary)
s
def func ( item) :
return float ( item[ : - 2 ] )
s = s. apply ( func)
s
0 80.0
1 83.0
2 78.0
3 74.0
4 84.0
dtype: float64
ratings = pd. read_json( '../../data/ratings.json' )
def func ( item) :
item[ item. isna( ) ] = item. mean( )
return item
ratings. apply ( func, axis= 0 )
<ipython-input-7-ebdfbe0e051f>:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
item[item.isna()] = item.mean()
John Carson Michelle Peterson William Reynolds Jillian Hobart Melissa Jones Alex Roberts Michael Henry Inception 2.5 3.0 2.50 3.5 3 3.0 3.166667 Pulp Fiction 3.5 3.5 3.00 3.5 4 4.0 4.500000 Anger Management 3.0 1.5 3.25 3.0 2 3.7 3.166667 Fracture 3.5 5.0 3.50 4.0 3 5.0 4.000000 Serendipity 2.5 3.5 3.25 2.5 2 3.5 1.000000 Jerry Maguire 3.0 3.0 4.00 4.5 3 3.0 3.166667
排序
d = { 'Name' : pd. Series( [ 'Tom' , 'James' , 'Ricky' , 'Vin' , 'Steve' , 'Minsu' , 'Jack' , 'Lee' , 'David' , 'Gasper' , 'Betina' , 'Andres' ] ) ,
'Age' : pd. Series( [ 25 , 26 , 25 , 23 , 30 , 29 , 23 , 34 , 40 , 30 , 51 , 46 ] ) ,
'Rating' : pd. Series( [ 4.23 , 3.24 , 3.98 , 2.56 , 3.20 , 4.6 , 3.8 , 3.78 , 2.98 , 4.80 , 4.10 , 3.65 ] ) }
unsorted_df = pd. DataFrame( d)
unsorted_df
Name Age Rating 0 Tom 25 4.23 1 James 26 3.24 2 Ricky 25 3.98 3 Vin 23 2.56 4 Steve 30 3.20 5 Minsu 29 4.60 6 Jack 23 3.80 7 Lee 34 3.78 8 David 40 2.98 9 Gasper 30 4.80 10 Betina 51 4.10 11 Andres 46 3.65
unsorted_df. sort_index( axis= 1 , ascending= True )
Age Name Rating 0 25 Tom 4.23 1 26 James 3.24 2 25 Ricky 3.98 3 23 Vin 2.56 4 30 Steve 3.20 5 29 Minsu 4.60 6 23 Jack 3.80 7 34 Lee 3.78 8 40 David 2.98 9 30 Gasper 4.80 10 51 Betina 4.10 11 46 Andres 3.65
unsorted_df. sort_index( ascending= False )
Name Age Rating 11 Andres 46 3.65 10 Betina 51 4.10 9 Gasper 30 4.80 8 David 40 2.98 7 Lee 34 3.78 6 Jack 23 3.80 5 Minsu 29 4.60 4 Steve 30 3.20 3 Vin 23 2.56 2 Ricky 25 3.98 1 James 26 3.24 0 Tom 25 4.23
按照某列字段进行排序
unsorted_df. sort_values(
by= [ 'Age' , 'Rating' ] , ascending= [ True , False ] )
Name Age Rating 6 Jack 23 3.80 3 Vin 23 2.56 0 Tom 25 4.23 2 Ricky 25 3.98 1 James 26 3.24 5 Minsu 29 4.60 9 Gasper 30 4.80 4 Steve 30 3.20 7 Lee 34 3.78 8 David 40 2.98 11 Andres 46 3.65 10 Betina 51 4.10
Merge & join
left = pd. DataFrame( {
'student_id' : [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 ] ,
'student_name' : [ 'Alex' , 'Amy' , 'Allen' , 'Alice' , 'Ayoung' , 'Billy' , 'Brian' , 'Bran' , 'Bryce' , 'Betty' , 'Emma' , 'Marry' , 'Allen' , 'Jean' , 'Rose' , 'David' , 'Tom' , 'Jack' , 'Daniel' , 'Andrew' ] ,
'class_id' : [ 1 , 1 , 1 , 2 , 2 , 2 , 3 , 3 , 3 , 4 , 1 , 1 , 1 , 2 , 2 , 2 , 3 , 3 , 3 , 2 ] ,
'gender' : [ 'M' , 'M' , 'F' , 'F' , 'M' , 'M' , 'F' , 'F' , 'M' , 'M' , 'F' , 'F' , 'M' , 'M' , 'F' , 'F' , 'M' , 'M' , 'F' , 'F' ] ,
'age' : [ 20 , 21 , 22 , 20 , 21 , 22 , 23 , 20 , 21 , 22 , 20 , 21 , 22 , 23 , 20 , 21 , 22 , 20 , 21 , 22 ] ,
'score' : [ 98 , 74 , 67 , 38 , 65 , 29 , 32 , 34 , 85 , 64 , 52 , 38 , 26 , 89 , 68 , 46 , 32 , 78 , 79 , 87 ] } )
right = pd. DataFrame(
{ 'class_id' : [ 1 , 2 , 3 , 5 ] ,
'class_name' : [ 'ClassA' , 'ClassB' , 'ClassC' , 'ClassE' ] } )
left
student_id student_name class_id gender age score 0 1 Alex 1 M 20 98 1 2 Amy 1 M 21 74 2 3 Allen 1 F 22 67 3 4 Alice 2 F 20 38 4 5 Ayoung 2 M 21 65 5 6 Billy 2 M 22 29 6 7 Brian 3 F 23 32 7 8 Bran 3 F 20 34 8 9 Bryce 3 M 21 85 9 10 Betty 4 M 22 64 10 11 Emma 1 F 20 52 11 12 Marry 1 F 21 38 12 13 Allen 1 M 22 26 13 14 Jean 2 M 23 89 14 15 Rose 2 F 20 68 15 16 David 2 F 21 46 16 17 Tom 3 M 22 32 17 18 Jack 3 M 20 78 18 19 Daniel 3 F 21 79 19 20 Andrew 2 F 22 87
right
class_id class_name 0 1 ClassA 1 2 ClassB 2 3 ClassC 3 5 ClassE
r = pd. merge( left, right, how= 'inner' )
r
student_id student_name class_id gender age score class_name 0 1 Alex 1 M 20 98 ClassA 1 2 Amy 1 M 21 74 ClassA 2 3 Allen 1 F 22 67 ClassA 3 11 Emma 1 F 20 52 ClassA 4 12 Marry 1 F 21 38 ClassA 5 13 Allen 1 M 22 26 ClassA 6 4 Alice 2 F 20 38 ClassB 7 5 Ayoung 2 M 21 65 ClassB 8 6 Billy 2 M 22 29 ClassB 9 14 Jean 2 M 23 89 ClassB 10 15 Rose 2 F 20 68 ClassB 11 16 David 2 F 21 46 ClassB 12 20 Andrew 2 F 22 87 ClassB 13 7 Brian 3 F 23 32 ClassC 14 8 Bran 3 F 20 34 ClassC 15 9 Bryce 3 M 21 85 ClassC 16 17 Tom 3 M 22 32 ClassC 17 18 Jack 3 M 20 78 ClassC 18 19 Daniel 3 F 21 79 ClassC
分组聚合
grouped = r. groupby( by= [ 'class_id' , 'gender' ] )
grouped. size( )
class_id gender
1 F 3
M 3
2 F 4
M 3
3 F 3
M 3
dtype: int64
grouped. get_group( ( 2 , 'F' ) )
student_id student_name class_id gender age score class_name 6 4 Alice 2 F 20 38 ClassB 10 15 Rose 2 F 20 68 ClassB 11 16 David 2 F 21 46 ClassB 12 20 Andrew 2 F 22 87 ClassB
grouped. agg( { 'score' : np. mean} )
r = grouped. agg( { 'score' : np. mean, 'age' : [ np. max , np. min ] } )
r
score age mean amax amin class_id gender 1 F 52.333333 22 20 M 66.000000 22 20 2 F 59.750000 22 20 M 61.000000 23 21 3 F 48.333333 23 20 M 65.000000 22 20
透视表
data = pd. merge( left, right)
data. pivot_table( index= 'class_id' )
age score student_id class_id 1 21.000000 59.166667 7.000000 2 21.285714 60.285714 11.428571 3 21.166667 56.666667 13.000000
data
student_id student_name class_id gender age score class_name 0 1 Alex 1 M 20 98 ClassA 1 2 Amy 1 M 21 74 ClassA 2 3 Allen 1 F 22 67 ClassA 3 11 Emma 1 F 20 52 ClassA 4 12 Marry 1 F 21 38 ClassA 5 13 Allen 1 M 22 26 ClassA 6 4 Alice 2 F 20 38 ClassB 7 5 Ayoung 2 M 21 65 ClassB 8 6 Billy 2 M 22 29 ClassB 9 14 Jean 2 M 23 89 ClassB 10 15 Rose 2 F 20 68 ClassB 11 16 David 2 F 21 46 ClassB 12 20 Andrew 2 F 22 87 ClassB 13 7 Brian 3 F 23 32 ClassC 14 8 Bran 3 F 20 34 ClassC 15 9 Bryce 3 M 21 85 ClassC 16 17 Tom 3 M 22 32 ClassC 17 18 Jack 3 M 20 78 ClassC 18 19 Daniel 3 F 21 79 ClassC
data. pivot_table( index= 'class_id' , values= 'score' )
score class_id 1 59.166667 2 60.285714 3 56.666667
data. pivot_table( index= [ 'class_id' , 'gender' ] , values= 'score' )
score class_id gender 1 F 52.333333 M 66.000000 2 F 59.750000 M 61.000000 3 F 48.333333 M 65.000000
data. pivot_table( index= [ 'class_id' , 'gender' ] ,
columns= 'age' , values= 'score' )
age 20 21 22 23 class_id gender 1 F 52.0 38.0 67.0 NaN M 98.0 74.0 26.0 NaN 2 F 53.0 46.0 87.0 NaN M NaN 65.0 29.0 89.0 3 F 34.0 79.0 NaN 32.0 M 78.0 85.0 32.0 NaN
data. pivot_table( index= [ 'class_id' , 'gender' ] ,
columns= 'age' , values= 'score' , aggfunc= np. max )
age 20 21 22 23 class_id gender 1 F 52.0 38.0 67.0 NaN M 98.0 74.0 26.0 NaN 2 F 68.0 46.0 87.0 NaN M NaN 65.0 29.0 89.0 3 F 34.0 79.0 NaN 32.0 M 78.0 85.0 32.0 NaN
r = data. pivot_table( index= [ 'class_id' , 'gender' ] ,
columns= 'age' , values= 'score' , aggfunc= np. max ,
margins= True )
r
age 20 21 22 23 All class_id gender 1 F 52.0 38.0 67.0 NaN 67 M 98.0 74.0 26.0 NaN 98 2 F 68.0 46.0 87.0 NaN 87 M NaN 65.0 29.0 89.0 89 3 F 34.0 79.0 NaN 32.0 79 M 78.0 85.0 32.0 NaN 85 All 98.0 85.0 87.0 89.0 98