关于DataFrame的详解:
http://blog.youkuaiyun.com/starter_____/article/details/79179562
Series的创建
Series是一种类似于一维数组的对象,它由一组数据(value)及一组与之相关的数据标签(index)组成。
传递list对象创建Series
若不指定索引,则会自动创建一个0到N-1(N为数组长度)的整数型索引
In [1]: import pandas as pd
In [2]: obj=pd.Series([1,5,9])
In [3]: obj
Out[3]:
0 1
1 5
2 9
dtype: int64
In [4]: obj.values
Out[4]: array([1, 5, 9], dtype=int64)
In [5]: obj.index
Out[5]: RangeIndex(start=0, stop=3, step=1)
若指定索引
In [7]: obj1=pd.Series([1,4,7,9],index=['a','s','d','f'])
In [8]: obj1
Out[8]:
a 1
s 4
d 7
f 9
dtype: int64
传递dict对象创建Series
若不指定索引,则Series中的索引就是原字典的键(有序排序)
In [9]: people={'name':'Mike','age':20,'sex':'male','income':5000}
In [10]: obj2=pd.Series(people)
In [11]: obj2
Out[11]:
age 20
name Mike
sex male
income 5000
dtype: object
若指定索引,则会找出原字典的键与其相匹配的部分,若索引不存在于原字典的键,则用NaN表示缺失值
In [12]: obj3=pd.Series(people,index=['name','age','pay','income'])
In [13]: obj3
Out[13]:
name Mike
age 20
pay NaN
income 5000
dtype: object
访问Series的字段
In [66]: obj['name']
Out[66]: 'Mike'
修改Series
修改Seires中的索引
In [41]: obj1
Out[41]:
a 1
s 4
d 7
f 9
dtype: int64
In [42]: obj1.index=['z','x','c','v']
In [43]: obj1
Out[43]:
z 1
x 4
c 7
v 9
dtype: int64
修改Seires中的值或增加Series的字段
In [15]: obj2['age']=30
In [16]: obj2['pay']=3000
In [17]: obj2
Out[17]:
age 30
name Mike
sex male
income 5000
pay 3000
dtype: object
删除Series的字段
In [21]: obj2.drop(['age'])
Out[21]:
name Mike
sex male
income 5000
pay 3000
dtype: object
检验Series中的缺失数据
In [22]: obj3
Out[22]:
name Mike
age 20
pay NaN
income 5000
dtype: object
In [23]: obj3.isnull()
Out[23]:
name False
age False
pay True
income False
dtype: bool
In [24]: obj3.notnull()
Out[24]:
name True
age True
pay False
income True
dtype: bool
Series的合并
1、若字段同时存在且为数值型,则合并字段的值为数值相加,如income;
2、若字段同时存在且为字符型,则合并字段的值为字符拼接,如name;
3、若字段同时存在,但其中一个Series的字段的值为NaN,则合并字段的值为NaN,如pay;
4、若字段不同时存在,则合并字段的值为NaN,如sex和age
In [25]: obj2
Out[25]:
name Mike
sex male
income 5000
pay 3000
dtype: object
In [26]: obj3
Out[26]:
name Mike
age 20
pay NaN
income 5000
dtype: object
In [27]: obj3+obj2
Out[27]:
age NaN
income 10000
name MikeMike
pay NaN
sex NaN
dtype: object
Series组成DataFrame
In [12]: pd.DataFrame({'one':obj2,'two':obj3})
Out[12]:
one two
age 20 20
income 5000 5000
name Mike Mike
pay NaN NaN
sex male NaN
Series的name属性
Series对象本身及其索引都有一个name属性
In [37]: obj3.name='someone'
In [39]: obj3.index.name='state'
In [40]: obj3
Out[40]:
state
name Mike
age 20
school NaN
income 5000
Name: someone, dtype: object