深入解析pandas库及其应用-优快云博客

本文链接：https://blog.youkuaiyun.com/u014403897/article/details/45037797

简介pandas其实是numpy的升级版，加强了索引的处理功能和对缺失数据的处理，具体有什么方便的地方以后详述。

引入包：

import pandas as pd

Series：实质上是一个key-value对应的一个列表的数据结构，例子如下：

>>> import pandas as pd
>>> a=pd.Series([4,5,6],dtype='float64')
>>> a
0    4
1    5
2    6
dtype: float64

>>> a=pd.Series([4,5,6],dtype='float64',index=['hah','lol','cry'])
>>> a
hah    4
lol    5
cry    6
dtype: float64


>>> a=pd.Series([4,['55','24'],6],index=['hah','lol','cry']) #不能指定dtype
>>> a
hah           4
lol    [55, 24]
cry           6
dtype: object

#查询如下

>>> a=pd.Series([4,55,6],index=['hah','lol','cry'])
>>> a.hah
4
>>> a['lol']
55

>>> 'lol' in a
True

与字典的转换：

python的一个字典可以很轻易的转换成一个Series，如下：

>>> a={}
>>> a['a']=1
>>> a['b']=2
>>> a
{'a': 1, 'b': 2}
>>> a=pd.Series(a)
>>> a
a    1
b    2
dtype: int64

最简单的使用，自动对齐索引，并对缺失数据进行处理：

>>> a=pd.Series(a)
>>> a
a    1
b    2
dtype: int64
>>> b={}
>>> b['c']=None
>>> b['b']=3
>>> b['f']=5
>>> b
{'c': None, 'b': 3, 'f': 5}
>>> b=pd.Series(b)
>>> b
b     3
c   NaN
f     5
dtype: float64
>>> a+b
a   NaN
b     5
c   NaN
f   NaN
dtype: float64

DataFrame：

DataFrame可以简单的理解为一个二维表（实质上可以有多维），也可以看做一个Series的list其是一个功能异常强大的数据结构，但此处仅仅是做简单介绍，使用接合着以后具体情况再说吧！

DataFrame构造：一个value为list的字典（必须严格遵循表的结构）：

>>> data={'state':['a','c','e'],'year':[200,5000,1000],'pop':[2,13,None]}
>>> data
{'state': ['a', 'c', 'e'], 'pop': [2, 13, None], 'year': [200, 5000, 1000]}

>>> pd.DataFrame(data)
   pop state  year
0    2     a   200
1   13     c  5000
2  NaN     e  1000

#没有行的index很不爽，不好表示元组

>>> pd.DataFrame(data,columns=['year','pop','state','hehe'],index=['a','c','e'])
   year  pop state hehe
a   200    2     a  NaN
c  5000   13     c  NaN
e  1000  NaN     e  NaN
#columns：调整顺序或添加新的内容

#值的查找与修改

   year  pop state hehe
a   200    2     a    0
c  5000   13     c  NaN
e  1000  NaN     e  NaN
>>> a.year.a
200
>>> a.hehe['a']=0  #修改只能这种形式
>>> a.hehe['a']=10
>>> a
   year  pop state hehe
a   200    2     a   10
c  5000   13     c  NaN
e  1000  NaN     e  NaN

关于pandas值的统计：

>>> a

>>> a
   year  pop state hehe
a   200    2     a   10
c  5000   13     c  NaN
e  1000  NaN     e  NaN
>>> a.sum(1)
a     202
c    5013
e    1000
dtype: float64
>>> a.sum(0)
year     6200
pop        15
state     ace
hehe       10
dtype: object
#axis的值与numpy相反



   year  pop state hehe
a   200    2     a   10
c  5000   13     c  NaN
e  1000  NaN     e  NaN
>>> a.sum(1)
a     202
c    5013
e    1000
dtype: float64
>>> a.sum(0)
year     6200
pop        15
state     ace
hehe       10
dtype: object
#axis的值与numpy相反

关于统计的相关方法相当多，不做详述，要用时候自己查表即可