Python_Series
Series介绍
Pandas模块的数据结构主要有两种:1、Series 2、DataFrame
Series是一维数组,基于Numpy的ndarray结构
Series创建:
pd.Series([list],index=[list]) 参数为list,index为可选参数,若不填写则默认index从0开始
---------------------------------------------------------------------------------------------------------------------------------------------
In :指定索引
import pandas as pd
import numpy as np
s1 = pd.Series(['a','b','c','d','e'],index=[1,2,3,4,5])
s1
Out:
1 a
2 b
3 c
4 d
5 e
dtype: object
---------------------------------------------------------
In :默认索引
s2 = pd.Series(['a','b','c','d','e'])
s2
Out:
0 a
1 b
2 c
3 d
4 e
dtype: object
---------------------------------------------------------
In :
s3 = pd.Series(np.arange(3),index=['a','b','c'])
s3
Out[4]:
a 0
b 1
c 2
dtype: int32
---------------------------------------------------------
s4 = pd.Series({dict}) 由字典创建Series
In [6]:
s4 = pd.Series({'a':1,'b':2,'c':3,'d':4,'e':5,'f':6})
s4
Out[6]:
a 1
b 2
c 3
d 4
e 5
f 6
dtype: int64
---------------------------------------------------------
In [7]:
s4.values
Out[7]:
array([1, 2, 3, 4, 5, 6], dtype=int64)
---------------------------------------------------------
In [8]:索引
s4.index
Out[8]:
Index(['a', 'b', 'c', 'd', 'e', 'f'], dtype='object')
---------------------------------------------------------
In [9]:切片
s4[:]
Out[9]:
a 1
b 2
c 3
d 4
e 5
f 6
dtype: int64
---------------------------------------------------------
In [10]:
s4[2]
Out[10]:
3
---------------------------------------------------------
In [11]:
s4[0:2]
Out[11]:
a 1
b 2
dtype: int64
---------------------------------------------------------
In [13]:
s4[3:]
Out[13]:
d 4
e 5
f 6
dtype: int64
---------------------------------------------------------
In [14]:指定索引值取数
s4['a']
Out[14]:
1
---------------------------------------------------------
In [19]:指定多个索引值取数
s4[['a','b','c']]
Out[19]:
a 1
b 2
c 3
dtype: int64
---------------------------------------------------------
In [21]:按照值的条件取数
s4[s4.values>3]
Out[21]:
d 4
e 5
f 6
dtype: int64
---------------------------------------------------------
In [23]:按照值的条件取数
s4[s4.index=='d']
Out[23]:
d 4
dtype: int64
---------------------------------------------------------
In [24]:
s4.mean() #均值
Out[24]:
3.5
---------------------------------------------------------
In [27]:
s4.median() #中位值
Out[27]:
3.5
---------------------------------------------------------
In [26]:
s4.sum() #求和
Out[26]:
21
---------------------------------------------------------
In [28]:
s4.std() #标准差
Out[28]:
1.8708286933869707
---------------------------------------------------------
In [30]:
s4.value_counts() #每个值的重复次数
Out[30]:
6 1
5 1
4 1
3 1
2 1
1 1
dtype: int64
---------------------------------------------------------
In [34]: 运算
s4/2 ,s4//2 ,s4%2 ,s4**2 #每个值除2 , 每个值除2后取整 ,取余 , 求平方
Out[34]:
(a 0.5
b 1.0
c 1.5
d 2.0
e 2.5
f 3.0
dtype: float64, a 0
b 1
c 1
d 2
e 2
f 3
dtype: int64, a 1
b 0
c 1
d 0
e 1
f 0
dtype: int64, a 1
b 4
c 9
d 16
e 25
f 36
dtype: int64)
---------------------------------------------------------
In [35]:
s4
Out[35]:
a 1
b 2
c 3
d 4
e 5
f 6
dtype: int64
---------------------------------------------------------
In [40]:
s5 = pd.Series({'a':1,'b':2,'g':7})
s5
Out[40]:
a 1
b 2
g 7
dtype: int64
---------------------------------------------------------
In [45]:
s6=s4+s5
s6
Out[45]:
a 2.0
b 4.0
c NaN
d NaN
e NaN
f NaN
g NaN
dtype: float64
---------------------------------------------------------
In [44]:
s6[s6.notnull()] #取非空值
Out[44]:
a 2.0
b 4.0
dtype: float64
. . .
In [46]:
s6[s6.isnull()] #取空值
Out[46]:
c NaN
d NaN
e NaN
f NaN
g NaN
dtype: float64
---------------------------------------------------------
In [48]:
s6=s6.fillna(2) # 空值填充
In [49]:
s6
Out[49]:
a 2.0
b 4.0
c 2.0
d 2.0
e 2.0
f 2.0
g 2.0
dtype: float64
---------------------------------------------------------
In [52]:
s6.drop('g') #删除
Out[52]:
a 2.0
b 4.0
c 2.0
d 2.0
e 2.0
f 2.0
dtype: float64
---------------------------------------------------------
In [53]:
s6
Out[53]:
a 2.0
b 4.0
c 2.0
d 2.0
e 2.0
f 2.0
g 2.0
dtype: float64
---------------------------------------------------------