In [3]: t = pd.Series([1,2,3,4])
In [4]: type(t)
Out[4]: pandas.core.series.Series
In [5]: t
Out[5]:
0 1
1 2
2 3
3 4
dtype: int64
In [6]: t2 = pd.Series([1,2,3,4],index=list("abcd"))
In [7]: t2
Out[7]:
a 1
b 2
c 3
d 4
dtype: int64
In [9]: temp_dict = {"name":"xiaoming","age":30,"tel":10086}
In [10]: t3 = pd.Series(temp_dict)
In [11]: t3
Out[11]:
name xiaoming
age 30
tel 10086
dtype: object
In [12]: t3.dtype
Out[12]: dtype('O')
In [13]: t2.dtype
Out[13]: dtype('int64')
修改数据的类型:
t2.astype(int)
切片和索引
索引
t3[“age”]
t3[2]
切片
t3[:2] # 连续
t3[[0,3]] # 不连续
t3[[“age”,“tel”]]
如果强行从中去没有的索引会取到NAN
t[t>4] # 取出值大于4的数据
In [15]: t3["age"]
Out[15]: 30
In [16]: t3[1]
Out[16]: 30
In [21]: t3[:2]
Out[21]:
name xiaoming
age 30
dtype: object
In [22]: t3[[0,2]]
Out[22]:
name xiaoming
tel 10086
dtype: object
In [23]: t3[["age","tel"]]
Out[23]:
age 30
tel 10086
dtype: object
取出索引
t3.index
返回值可以迭代
可以强制类型转换
取出值
t3.values
In [24]: t3.index
Out[24]: Index(['name', 'age', 'tel'], dtype='object')
In [25]: for i in t3.index:
...: print(i)
...:
name
age
tel
In [26]: type(t3.index)
Out[26]: pandas.core.indexes.base.Index
In [27]: list(t3.index)
Out[27]: ['name', 'age', 'tel']
In [28]: t3.values
Out[28]: array(['xiaoming', 30, 10086], dtype=object)
In [29]: type(t3.values)
Out[29]: numpy.ndarray
In [37]: pd.DataFrame(np.arange(12).reshape(3,4))
Out[37]:
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
In [38]: pd.DataFrame(np.arange(12).reshape(3,4),index=list("abc"),columns=list("efgh"))
Out[38]:
e f g h
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
In [40]: d1 = {"name":["xiaoming","xiaohong","xiaogang"],"age":[20,30,40],"tel":[10086,1008611 ,10010]}
In [42]: t1 = pd.DataFrame(d1)
In [43]: t1
Out[43]:
name age tel
0 xiaoming 20 10086
1 xiaohong 30 1008611
2 xiaogang 40 10010
In [44]: type(t1)
Out[44]: pandas.core.frame.DataFrame
In [45]:
In [45]: d2 = [{"name":"xiaohong","age":20,"tel":10086},{"name":"xiaogang","age":25},{"name":" xiaoming","tel":10010}]
In [46]: pd.DataFrame(d2)
Out[46]:
name age tel
0 xiaohong 20.0 10086.0
1 xiaogang 25.0 NaN
2 xiaoming NaN 10010.0
dataframe 方法
t2.index
行索引
t2.columns
列索引
t2.shape
数据的形状
t2.dtypes
数据的类型
t2.ndim
维度
t2.head(x)
显示数据的前几行,默认是前5 行
t2.tail(x)
显示后几行
t2.info()
展示数据的概览
t2.describe()
快速统计数字类型的列
In [58]: t2
Out[58]:
name age tel
0 xiaohong 20.0 10086.0
1 xiaogang 25.0 NaN
2 xiaoming NaN 10010.0
In [51]: t2.index
Out[51]: RangeIndex(start=0, stop=3, step=1)
In [52]: t2.columns
Out[52]: Index(['name', 'age', 'tel'], dtype='object')
In [53]: t2.values
Out[53]:
array([['xiaohong', 20.0, 10086.0],
['xiaogang', 25.0, nan],
['xiaoming', nan, 10010.0]], dtype=object)
In [54]: t2.shape
Out[54]: (3, 3)
In [56]: t2.dtypes
Out[56]:
name object
age float64
tel float64
dtype: object
In [57]: t2.ndim
Out[57]: 2
dataframe中的排序方法
读取数据:df = pd.read_csv()
df.sort_values(by=‘索引名’)
默认为升序
ascending 设置成false为降序
读取文件
pd.read_csv()
读取本地文件
pd.read_clipboard()
从剪贴版中读取
pd.read_excel()
pd.read_html()
pd.read_json()
pd.read_sql()
读取mysql中的数据
第一个参数:sql语句
第二个参数:数据库的连接
pandas 中取行取列
取前10行
df[:10]
取前10行的某一列
df[:10][“列名”]
中括号中[]
只写数字: 表示去行对行进行操作
写字符串: 表示去列索引
取出某一列后的数据类型
<class ‘pandas.core.series.Series’>
loc
通过 标签 获取行数据
可以使用 :来取 包括开头和结尾
iloc
通过 位置 获取行数据
""" loc"""
In [60]: t3 = pd.DataFrame(np.arange(12).reshape(3,4),index=list("abc"),columns=list("WXYZ"))
In [61]: t3
Out[61]:
W X Y Z
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
In [62]: t3.loc["a","Z"] #取a行Z列
Out[62]: 3
In [63]: type(t3.loc["a","Z"])
Out[63]: numpy.int32
In [64]: t3.loc["a"] # 取a行
Out[64]:
W 0
X 1
Y 2
Z 3
Name: a, dtype: int32
In [65]: t3.loc["a",:] # 取a行
Out[65]:
W 0
X 1
Y 2
Z 3
Name: a, dtype: int32
In [66]: t3.loc[:,"Y"] # 取列
Out[66]:
a 2
b 6
c 10
Name: Y, dtype: int32
In [67]: t3.loc[["a","c"]] # 取多行
Out[67]:
W X Y Z
a 0 1 2 3
c 8 9 10 11
In [69]: t3.loc[:,["W","Z"]] # 取多列
Out[69]:
W Z
a 0 3
b 4 7
c 8 11
In [70]: t3.loc[["a","c"],["W","Z"]] # 取多行多列
Out[70]:
W Z
a 0 3
c 8 11
In [73]: t3.loc["a":"c",["W","Z"]] # 可以使用切片 包括开头和结尾
Out[73]:
W Z
a 0 3
b 4 7
c 8 11
"""iloc"""
In [78]: t3
Out[78]:
W X Y Z
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
In [79]: t3.iloc[1] # 取行
Out[79]:
W 4
X 5
Y 6
Z 7
Name: b, dtype: int32
In [80]: t3.iloc[1,:] # 取行
Out[80]:
W 4
X 5
Y 6
Z 7
Name: b, dtype: int32
In [81]: t3.iloc[:,2]
Out[81]:
a 2
b 6
c 10
Name: Y, dtype: int32
In [82]: t3.iloc[:,2] # 取列
Out[82]:
a 2
b 6
c 10
Name: Y, dtype: int32
In [83]: t3.iloc[:,[1,2]] # 取多列
Out[83]:
X Y
a 1 2
b 5 6
c 9 10
In [85]: t3.iloc[[0,1],[1,2]] # 取多行多列
Out[85]:
X Y
a 1 2
b 5 6
In [87]: t3.iloc[1:,:2] # 第一行之后的每一行,前两列
Out[87]:
W X
b 4 5
c 8 9
In [88]: t3.iloc[1:,:2] = 30 # 赋值
In [89]: t3
Out[89]:
W X Y Z
a 0 1 2 3
b 30 30 6 7
c 30 30 10 11