
以上图片截取自利用python进行数据分析的中文翻译。感谢翻译者。
构造DataFrame的数据格式
import pandas as pd
import numpy as np
二维ndarray
arr2d = np.random.randint(0,9,size=(5,4))
arr2d
array([[6, 3, 0, 4],
[2, 5, 5, 0],
[3, 7, 6, 5],
[8, 3, 7, 6],
[8, 4, 2, 6]])
df_arr2d = pd.DataFrame(arr2d)
df_arr2d
| 0 | 1 | 2 | 3 |
---|
0 | 6 | 3 | 0 | 4 |
---|
1 | 2 | 5 | 5 | 0 |
---|
2 | 3 | 7 | 6 | 5 |
---|
3 | 8 | 3 | 7 | 6 |
---|
4 | 8 | 4 | 2 | 6 |
---|
cols = ['A','B','C','D']
idx = ['a','b','c','d','e']
df_arr2d = pd.DataFrame(arr2d,index=pd.Index(idx),columns=cols)
df_arr2d
| A | B | C | D |
---|
a | 6 | 3 | 0 | 4 |
---|
b | 2 | 5 | 5 | 0 |
---|
c | 3 | 7 | 6 | 5 |
---|
d | 8 | 3 | 7 | 6 |
---|
e | 8 | 4 | 2 | 6 |
---|
由数组、列表或元组组成的字典
所有序列长度必须相同
数组组成的字典
import random
arr1 = np.random.randint(0,9,size=10)
arr2 = np.random.randint(0,9,size=10)
arr3 = np.random.randint(0,9,size=10)
dic_array = {'A':arr1,'B':arr2,'C':arr3}
DF_dic_array = pd.DataFrame(dic_array)
DF_dic_array
| A | B | C |
---|
0 | 1 | 4 | 2 |
---|
1 | 2 | 2 | 1 |
---|
2 | 0 | 2 | 2 |
---|
3 | 6 | 3 | 1 |
---|
4 | 2 | 4 | 4 |
---|
5 | 3 | 4 | 5 |
---|
6 | 3 | 3 | 4 |
---|
7 | 4 | 8 | 8 |
---|
8 | 6 | 4 | 7 |
---|
9 | 3 | 4 | 2 |
---|
由列表组成的字典
L1 = [1,2,3,4,5]
L2 = [6,7,8,9,0]
L3 = ['a','b','c','d','e']
dict_list = {'A':L1,'B':L2,'C':L3}
DF_dict_list = pd.DataFrame(dict_list)
DF_dict_list
由元组组成的字典
t1 = (1,2,3,4,5)
t2 = (6,7,8,9,0)
t3 = ('a','b','c','d','e')
dict_tuple = {'A':t1, 'B':t2, 'C':t3}
DF_dict_t1 = pd.DataFrame(dict_tuple)
DF_dict_t1
NumPy的结构化/记录数组
由Series组成的字典
s1 = pd.Series([1,2,3,4,5])
s2 = pd.Series(['a','b','c','d','e','f'])
s3 = pd.Series([6,7,8,9,0])
dict_s = {'A':s1,'B':s2,'C':s3}
DF_dict_s = pd.DataFrame(dict_s)
DF_dict_s
| A | B | C |
---|
0 | 1.0 | a | 6.0 |
---|
1 | 2.0 | b | 7.0 |
---|
2 | 3.0 | c | 8.0 |
---|
3 | 4.0 | d | 9.0 |
---|
4 | 5.0 | e | 0.0 |
---|
5 | NaN | f | NaN |
---|
字典组成的字典
外层字典的键作为列,内层键则作为行索引
内层字典会被合并,排序,形成最后的索引
d1 = {2004:6,2001:7,2002:8,2005:9,2003:0}
d2 = {2001:1,2002:2,2003:3,2004:4,2005:5}
d3 = {2005:'a',2001:'b',2002:'d',2004:'c',2003:'e'}
dict_d = {'A':d1,'B':d2,'C':d3}
DF_dict_d = pd.DataFrame(dict_d)
DF_dict_d
| A | B | C |
---|
2001 | 7 | 1 | b |
---|
2002 | 8 | 2 | d |
---|
2003 | 0 | 3 | e |
---|
2004 | 6 | 4 | c |
---|
2005 | 9 | 5 | a |
---|
字典或Series的列表
各项成为DataFrame的一行。
字典键或Series索引的并集成为DF的列标
字典的列表
d1 = {2001:1,2002:2,2003:3,2004:4,2005:5}
d2 = {2003:6,2002:7,2001:8,2004:9,2005:0}
d3 = {2001:'a',2002:'b',2003:'c',2004:'d',2005:'e'}
List_d = [d1,d2,d3]
List_d
[{2001: 1, 2002: 2, 2003: 3, 2004: 4, 2005: 5},
{2003: 6, 2002: 7, 2001: 8, 2004: 9, 2005: 0},
{2001: 'a', 2002: 'b', 2003: 'c', 2004: 'd', 2005: 'e'}]
DF_list_d = pd.DataFrame(List_d)
DF_list_d
| 2001 | 2002 | 2003 | 2004 | 2005 |
---|
0 | 1 | 2 | 3 | 4 | 5 |
---|
1 | 8 | 7 | 6 | 9 | 0 |
---|
2 | a | b | c | d | e |
---|
Series组成的列表
s1 = pd.Series(np.random.randint(0,4,size=(5)))
s2 = pd.Series(np.random.randint(5,9,size=(5)))
s3 = pd.Series(['a','b','c','d','e'])
List_s = [s1,s2,s3]
DF_list_s = pd.DataFrame(List_s)
DF_list_s
由列表或元组组成的列表
类似"二维的ndarray"
列表组成的列表
l1 = [1,2,3,4,5]
l2 = [6,7,8,9,0]
l3 = ['a','b','c','d','e']
ll = [l1,l2,l3]
DF_ll = pd.DataFrame(ll)
DF_ll
元组组成的列表
t1 = (1,2,3,4,5)
t2 = (6,7,8,9,0)
t3 = ('a','b','c','d','e')
LT = [t1,t2,t3]
DF_LT = pd.DataFrame(LT)
DF_LT
另一个DF
arr1 = np.array([[1,2,3,4,5],
[6,7,8,9,0],
['a','b','c','d','e']])
arr1
array([['1', '2', '3', '4', '5'],
['6', '7', '8', '9', '0'],
['a', 'b', 'c', 'd', 'e']], dtype='<U11')
DF1 = pd.DataFrame(arr1)
DF1
DF2 = pd.DataFrame(DF1)
DF2
NumPy的MaskedArray
没研究过MaskedArray。。。