透视表技巧详解-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_44039183/article/details/108033194

透视表

透视表

透视表

引入相关库

import numpy as np
import pandas as pd
from pandas import Series,DataFrame

数据获取

df=pd.read_excel('../homework/sales-funnel.xlsx')
df

Acount代表客户账户，Name代表客户名字，Rep代表销售代表名字，Manager代表销售代表老板名字，Product代表卖出的产品，Quantity代表产品质量，Price代表产品价格，Status代表状态

	Account	Name	Rep	Manager	Product	Quantity	Price	Status
0	714466	Trantow-Barrows	Craig Booker	Debra Henley	CPU	1	30000	presented
1	714466	Trantow-Barrows	Craig Booker	Debra Henley	Software	1	10000	presented
2	714466	Trantow-Barrows	Craig Booker	Debra Henley	Maintenance	2	5000	pending
3	737550	Fritsch, Russel and Anderson	Craig Booker	Debra Henley	CPU	1	35000	declined
4	146832	Kiehn-Spinka	Daniel Hilton	Debra Henley	CPU	2	65000	won
5	218895	Kulas Inc	Daniel Hilton	Debra Henley	CPU	2	40000	pending
6	218895	Kulas Inc	Daniel Hilton	Debra Henley	Software	1	10000	presented
7	412290	Jerde-Hilpert	John Smith	Debra Henley	Maintenance	2	5000	pending
8	740150	Barton LLC	John Smith	Debra Henley	CPU	1	35000	declined
9	141962	Herman LLC	Cedric Moss	Fred Anderson	CPU	2	65000	won
10	163416	Purdy-Kunde	Cedric Moss	Fred Anderson	CPU	1	30000	presented
11	239344	Stokes LLC	Cedric Moss	Fred Anderson	Maintenance	1	5000	pending
12	239344	Stokes LLC	Cedric Moss	Fred Anderson	Software	1	10000	presented
13	307599	Kassulke, Ondricka and Metz	Wendy Yule	Fred Anderson	Maintenance	3	7000	won
14	688981	Keeling LLC	Wendy Yule	Fred Anderson	CPU	5	100000	won
15	729833	Koepp Ltd	Wendy Yule	Fred Anderson	CPU	2	65000	declined
16	729833	Koepp Ltd	Wendy Yule	Fred Anderson	Monitor	2	5000	presented

生成透视表

在这里插入图片描述
使用pivot_table方法生成透视表，这个函数第一个参数表示我们要对什么数据进行一个透视，index表示要对哪一个columns做一个透视
下面对‘name’这一column做透视表，生成了新的DataFrame，index使用原始表里面‘name’，且经过了去重处理，把同一‘name’的行使用取均值的方法（默认聚合方法）进行了合并（例如Quantiy：（1+1+2/3）=1.3）

pd.pivot_table(df,index=['Name'])

	Account	Price	Quantity
Name
Barton LLC	740150	35000	1.000000
Fritsch, Russel and Anderson	737550	35000	1.000000
Herman LLC	141962	65000	2.000000
Jerde-Hilpert	412290	5000	2.000000
Kassulke, Ondricka and Metz	307599	7000	3.000000
Keeling LLC	688981	100000	5.000000
Kiehn-Spinka	146832	65000	2.000000
Koepp Ltd	729833	35000	2.000000
Kulas Inc	218895	25000	1.500000
Purdy-Kunde	163416	30000	1.000000
Stokes LLC	239344	7500	1.000000
Trantow-Barrows	714466	15000	1.333333

把聚合方法改为sum，Quntity：1+1+2=4

pd.pivot_table(df,index=['Name'],aggfunc='sum')

	Account	Price	Quantity
Name
Barton LLC	740150	35000	1
Fritsch, Russel and Anderson	737550	35000	1
Herman LLC	141962	65000	2
Jerde-Hilpert	412290	5000	2
Kassulke, Ondricka and Metz	307599	7000	3
Keeling LLC	688981	100000	5
Kiehn-Spinka	146832	65000	2
Koepp Ltd	1459666	70000	4
Kulas Inc	437790	50000	3
Purdy-Kunde	163416	30000	1
Stokes LLC	478688	15000	2
Trantow-Barrows	2143398	45000	4

添加两个另外的index，‘Rep’和‘Manager’，新的透视表的index变成了三个

pd.pivot_table(df,index=['Name','Rep','Manager'])

			Account	Price	Quantity
Name	Rep	Manager
Barton LLC	John Smith	Debra Henley	740150	35000	1.000000
Fritsch, Russel and Anderson	Craig Booker	Debra Henley	737550	35000	1.000000
Herman LLC	Cedric Moss	Fred Anderson	141962	65000	2.000000
Jerde-Hilpert	John Smith	Debra Henley	412290	5000	2.000000
Kassulke, Ondricka and Metz	Wendy Yule	Fred Anderson	307599	7000	3.000000
Keeling LLC	Wendy Yule	Fred Anderson	688981	100000	5.000000
Kiehn-Spinka	Daniel Hilton	Debra Henley	146832	65000	2.000000
Koepp Ltd	Wendy Yule	Fred Anderson	729833	35000	2.000000
Kulas Inc	Daniel Hilton	Debra Henley	218895	25000	1.500000
Purdy-Kunde	Cedric Moss	Fred Anderson	163416	30000	1.000000
Stokes LLC	Cedric Moss	Fred Anderson	239344	7500	1.000000
Trantow-Barrows	Craig Booker	Debra Henley	714466	15000	1.333333

把‘name’去掉，更换‘Rep’和‘Manager’顺序，可以看到一个Manager下面有多个Rep

pd.pivot_table(df,index=['Manager','Rep'])

		Account	Price	Quantity
Manager	Rep
Debra Henley	Craig Booker	720237.0	20000.000000	1.250000
	Daniel Hilton	194874.0	38333.333333	1.666667
	John Smith	576220.0	20000.000000	1.500000
Fred Anderson	Cedric Moss	196016.5	27500.000000	1.250000
Fred Anderson	Wendy Yule	614061.5	44250.000000	3.000000

通过values参数来从‘Account Price Quantity‘指定哪个values，例下面只关心Price

pd.pivot_table(df,index=['Manager','Rep'],values=['Price'])

		Price
Manager	Rep
Debra Henley	Craig Booker	20000.000000
	Daniel Hilton	38333.333333
	John Smith	20000.000000
Fred Anderson	Cedric Moss	27500.000000
Fred Anderson	Wendy Yule	44250.000000

改变聚合方法变成求和

pd.pivot_table(df,index=['Manager','Rep'],values=['Price'],aggfunc='sum')

		Price
Manager	Rep
Debra Henley	Craig Booker	80000
	Daniel Hilton	115000
	John Smith	40000
Fred Anderson	Cedric Moss	110000
Fred Anderson	Wendy Yule	177000

还可以指定多个values

pd.pivot_table(df,index=['Manager','Rep'],values=['Price','Quantity'],aggfunc='sum')

		Price	Quantity
Manager	Rep
Debra Henley	Craig Booker	80000	5
	Daniel Hilton	115000	5
	John Smith	40000	3
Fred Anderson	Cedric Moss	110000	5
Fred Anderson	Wendy Yule	177000	12

columns方法对于每一个values还可以看不同的columns，例如对于Price和Quantity都可以看product的’CPU Maintenance Monitor Software‘等参数

pd.pivot_table(df,index=['Manager','Rep'],values=['Price','Quantity'],columns=['Product'],aggfunc='sum')

		Price				Quantity
	Product	CPU	Maintenance	Monitor	Software	CPU	Maintenance	Monitor	Software
Manager	Rep
Debra Henley	Craig Booker	65000.0	5000.0	NaN	10000.0	2.0	2.0	NaN	1.0
	Daniel Hilton	105000.0	NaN	NaN	10000.0	4.0	NaN	NaN	1.0
	John Smith	35000.0	5000.0	NaN	NaN	1.0	2.0	NaN	NaN
Fred Anderson	Cedric Moss	95000.0	5000.0	NaN	10000.0	3.0	1.0	NaN	1.0
Fred Anderson	Wendy Yule	165000.0	7000.0	5000.0	NaN	7.0	3.0	2.0	NaN

fill_value可以同于NaN数值的填充

pd.pivot_table(df,index=['Manager','Rep'],values=['Price','Quantity'],columns=['Product'],fill_value=0,aggfunc='sum')

		Price				Quantity
	Product	CPU	Maintenance	Monitor	Software	CPU	Maintenance	Monitor	Software
Manager	Rep
Debra Henley	Craig Booker	65000	5000	0	10000	2	2	0	1
	Daniel Hilton	105000	0	0	10000	4	0	0	1
	John Smith	35000	5000	0	0	1	2	0	0
Fred Anderson	Cedric Moss	95000	5000	0	10000	3	1	0	1
Fred Anderson	Wendy Yule	165000	7000	5000	0	7	3	2	0