Series常用操作:
sel=Series(data=[1,2,3,4],index=[“a”,“b”,“c”,“d”])
#Series是Pandas中最基本的对象,类似于一维数组,和numpy数组对象差不多,Series可以为数据自定义标签,也就是索引,
import pandas as pd
from pandas import Series,DataFrame
import numpy as np
#创建Series对象并忽略索引:
sel=Series([1,2,3,4])
print(sel)
print("-"*10)
#通常会自己创建索引:
sel=Series(data=[1,2,3,4],index=["a","b","c","d"])
sel=Series(data=[1,2,3,4],index=list("abcd"))
print(sel)
输出:
0 1
1 2
2 3
3 4
dtype: int64
----------
a 1
b 2
c 3
d 4
dtype: int64
获取内容:包括内容、键值对、索引和 键值对;
sel=Series(data=[1,2,3,4],index=["a","b","c","d"])
sel=Series(data=[1,2,3,4],index=list("abcd"))
print("-"*10)
#获取内容:
print(sel.values)
#获取索引:
print(sel.index)
#获取索引和键值对:
print(list(sel.iteritems()))
输出:
[1 2 3 4]
Index([‘a’, ‘b’, ‘c’, ‘d’], dtype=‘object’)
[(‘a’, 1), (‘b’, 2), (‘c’, 3), (‘d’, 4)]
将字典转化为Series:
dict={"red":100,"black":400,"green":300,"pink":500}
sel=Series(dict)
print(sel)
输出:
red 100
black 400
green 300
pink 500
dtype: int64
Series的数据获取:单个数据【】—多个数据不连续【 ,】——连续的数据采用切片【 :】
sel=Series(data=[1,2,3,4],index=list("abcd"))
print("-"*30)
#Series对象同时支持位置和标签两种获取数据;
print("索引下标",sel["c"])
print("-"*30)
print("位置下标",sel[2])
print("-"*30)
#获取不连续的数据:
print("索引下标",sel[["a","c"]])
print("-"*30)
print("位置下标",sel[[1,2]])
print("-"*30)
#使用切片获取数据:
print("位置切片",sel[1:3])
print("-"*30)
print("索引切片",sel["b":"d"])
输出:
------------------------------
索引下标 3
------------------------------
位置下标 3
------------------------------
索引下标 a 1
c 3
dtype: int64
------------------------------
位置下标 b 2
c 3
dtype: int64
------------------------------
位置切片 b 2
c 3
dtype: int64
------------------------------
索引切片 b 2
c 3
d 4
重新赋值索引的值:
sel.index=list("dcba")
print(sel)
#reindex对原来的数组进行重新赋值,返回一个新的series,缺失值使用nan代替;
print(sel.reindex(["b","a","c","d","e"]))
输出:
d 1
c 2
b 3
a 4
dtype: int64
b 3.0
a 4.0
c 2.0
d 1.0
e NaN
drop丢弃指定轴上的项:
sel=pd.Series(range(10,15))
print(sel)
print(sel.drop([2,3]))
输出:
0 10
1 11
2 12
3 13
4 14
dtype: int64
0 10
1 11
4 14
dtype: int64
(2)Series进行算术操作:基于index进行的,可以用加减乘除这样的运算符来对两个Series进行计算;
同样也支持numpy的数组:
pandas 根据索引index对应的数据进行计算,结果以浮点数进行计算,避免丢失精度,
#如果pandas在两个series中找不到相同的index,对应的位置就会返回一个空值
series1=pd.Series([1,2,3,4],["London","HongKong","Humbai","lagos"])
series2=pd.Series([1,3,6,4],["London","Accra","lagos","Delhi"])
print(series1-series2)
print("-"*30)
print(series1+series2)
print("-"*30)
print(series1*series2)
输出:
Accra NaN
Delhi NaN
HongKong NaN
Humbai NaN
London 0.0
lagos -2.0
dtype: float64
------------------------------
Accra NaN
Delhi NaN
HongKong NaN
Humbai NaN
London 2.0
lagos 10.0
dtype: float64
------------------------------
Accra NaN
Delhi NaN
HongKong NaN
Humbai NaN
London 1.0
lagos 24.0
dtype: float64
sel=Series(data=[1,2,3,0],index=list("abcd"))
print(sel[sel]>3)#布尔数据进行过滤
print("-"*30)
print(sel*2)#标量乘法
输出:
b False
c False
d False
a False
dtype: bool
------------------------------
a 2
b 4
c 6
d 0
dtype: int64
(3)创建DataFrame
Dataframe的创建
#使用二维数组:
df1=DataFrame(np.random.randint(0,10,(4,4)),index=[1,2,3,4],columns=["a","b","c","d"])
print(df1)
输出:
a b c d
1 3 5 1 3
2 6 1 9 2
3 5 7 2 5
4 1 3 5 1
使用字典创建:
dict={"province":["Guangdong","Beijing","Qinghai","Fujian"],
"pop":[1.3,2.5,1.1,0.7],
"year":[2018,2018,2018,2018]
}
df2=pd.DataFrame(dict,index=[1,2,3,4])
print(df2)
输出:
province pop year
1 Guangdong 1.3 2018
2 Beijing 2.5 2018
3 Qinghai 1.1 2018
4 Fujian 0.7 2018