pandas库Series用法
构造/初始化Series的3种方法:
1)用列表list构建Series
import pandas as pd
my_list=[7,'Beijing','19大',3.1415,-10000,'Happy']
s=pd.Series(my_list)
print(type(s))
print(s)
<class 'pandas.core.series.Series'>
0 7
1 Beijing
2 19大
3 3.1415
4 -10000
5 Happy
dtype: object
1.a)pandas会默认用0到n来做Series的index,但也可以自己指定index,index你可以理解为dict里面的key
s=pd.Series([7,'Beijing','19大',3.1415,-10000,'Happy'],
index=['A','B','C','D','E','F'])
print(s)
A 7
B Beijing
C 19大
D 3.1415
E -10000
F Happy
dtype: object
2)用字典dict来构建Series,因为Series本身其实就是key-value的结构
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
print(apts)
Beijing 55000.0
Guangzhou 45000.0
Hangzhou 20000.0
Shanghai 60000.0
Suzhou NaN
shenzhen 50000.0
Name: income, dtype: float64
3)用numpy array来构建Series
import numpy as np
d=pd.Series(np.random.randn(5),index=['a','b','c','d','e'])
print(d)
a -0.329401
b -0.435921
c -0.232267
d -0.846713
e -0.406585
dtype: float64
选择数据:
1)可以像对待一个list一样对待一个Series,完成各种切片的操作
import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
print(apts)
Beijing 55000.0
Guangzhou 45000.0
Hangzhou 20000.0
Shanghai 60000.0
Suzhou NaN
shenzhen 50000.0
Name: income, dtype: float64
print(apts[3])
60000.0
print(apts[[3,4,1]])
Shanghai 60000.0
Suzhou NaN
Guangzhou 45000.0
Name: income, dtype: float64
print(apts[1:])
Guangzhou 45000.0
Hangzhou 20000.0
Shanghai 60000.0
Suzhou NaN
shenzhen 50000.0
Name: income, dtype: float64
print(apts[:-2])
Beijing 55000.0
Guangzhou 45000.0
Hangzhou 20000.0
Shanghai 60000.0
Name: income, dtype: float64
print(apts[1:]+apts[:-1])
Beijing NaN
Guangzhou 90000.0
Hangzhou 40000.0
Shanghai 120000.0
Suzhou NaN
shenzhen NaN
Name: income, dtype: float64
2)Series就像一个dict,前面定义的index就是用来选择数据的
import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
print(apts['Shanghai']) ###
60000.0
print('Hangzhou' in apts)
True
print('Choingqing' in apts)
False
3)boolean indexing,和numpy很像
import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
less_than_50000=(apts<=50000) ###
print(apts[less_than_50000])
Guangzhou 45000.0
Hangzhou 20000.0
shenzhen 50000.0
Name: income, dtype: float64
注:可以使用numpy的各种函数mean,median,max,min
print(apts.mean())
46000.0
Series元素赋值:
1)直接利用索引值赋值
import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
print(apts)
print('Old income of shenzhen:{}'.format(apts['shenzhen']))
Beijing 55000.0
Guangzhou 45000.0
Hangzhou 20000.0
Shanghai 60000.0
Suzhou NaN
shenzhen 50000.0
Name: income, dtype: float64
Old income of shenzhen:50000.0
apts['shenzhen']=70000 ###
print(apts)
print('New income of shenzhen:{}'.format(apts['shenzhen']))
Beijing 55000.0
Guangzhou 45000.0
Hangzhou 20000.0
Shanghai 60000.0
Suzhou NaN
shenzhen 70000.0
Name: income, dtype: float64
New income of shenzhen:70000.0
2)不要忘了上面的boolean indexing,在赋值里它也可以用
import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
apts['shenzhen']=70000
print('New income of shenzhen:{}'.format(apts['shenzhen']))
less_than_50000=(apts<50000) ###
print(less_than_50000)
apts[less_than_50000]=40000 ###
print(apts)
Beijing False
Guangzhou True
Hangzhou True
Shanghai False
Suzhou False
shenzhen False
Name: income, dtype: bool
Beijing 55000.0
Guangzhou 40000.0
Hangzhou 40000.0
Shanghai 60000.0
Suzhou NaN
shenzhen 70000.0
Name: income, dtype: float64
数学运算
import pandas as pd
import numpy as np
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
apts['shenzhen']=70000
print('New income of shenzhen:{}'.format(apts['shenzhen']))
less_than_50000=(apts<50000)
apts[less_than_50000]=40000
print(apts)
print(apts/2) ###
print(apts**1.5) ###
print(np.log(apts)) ###
apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000})
print(apts2)
print(apts+apts2) ###
数据缺失
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
apts['shenzhen']=70000
less_than_50000=(apts<50000)
apts[less_than_50000]=40000
print(apts)
Beijing 55000.0
Guangzhou 40000.0
Hangzhou 40000.0
Shanghai 60000.0
Suzhou NaN
shenzhen 70000.0
Name: income, dtype: float64
apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000})
print(apts2)
Beijing 10000
Chongqing 30000
Guangzhou 7000
Shanghai 8000
Tianjin 40000
shenzhen 6000
dtype: int64
print('Hangzhou' in apts) ###
print('Hangzhou' in apts2)
True
False
print(apts.notnull()) #boolean条件 ###
Beijing True
Guangzhou True
Hangzhou True
Shanghai True
Suzhou False
shenzhen True
Name: income, dtype: bool
print(apts.isnull()) ###
Beijing False
Guangzhou False
Hangzhou False
Shanghai False
Suzhou True
shenzhen False
Name: income, dtype: bool
print(apts[apts.isnull()]) #利用缺失索引布尔值取元素
Suzhou NaN
Name: income, dtype: float64
apts=apts+apts2 #索引缺失相加
print(apts)
Beijing 65000.0
Chongqing NaN
Guangzhou 47000.0
Hangzhou NaN
Shanghai 68000.0
Suzhou NaN
Tianjin NaN
shenzhen 76000.0
dtype: float64
apts[apts.isnull()]=apts.mean() #将缺失位置赋值为中值
print(apts)
Beijing 65000.0
Chongqing 64000.0
Guangzhou 47000.0
Hangzhou 64000.0
Shanghai 68000.0
Suzhou 64000.0
Tianjin 64000.0
shenzhen 76000.0
dtype: float64
扫码关注公众号:瑞行AI,欢迎交流AI算法、数据分析等技术,提供技术方案咨询和就业指导服务!