可以用来构造Pandas DataFrame的数据源格式_利用ndarray构造一个4×3的dataframe和一个3x2的dataframe,分别命名为df-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_37855575/article/details/94466994

本文详细介绍了如何使用各种数据格式构建Pandas DataFrame，包括二维ndarray、数组、列表或元组组成的字典、NumPy的结构化/记录数组、由Series组成的字典、字典组成的字典、字典或Series的列表、以及由列表或元组组成的列表。每个数据源都有其特定的要求和应用场景，如所有序列必须长度相同，外层字典的键作为列，内层键作为行索引等。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在这里插入图片描述
以上图片截取自利用python进行数据分析的中文翻译。感谢翻译者。

构造DataFrame的数据格式

import pandas as pd
import numpy as np

二维ndarray

arr2d = np.random.randint(0,9,size=(5,4))
arr2d

array([[6, 3, 0, 4],
       [2, 5, 5, 0],
       [3, 7, 6, 5],
       [8, 3, 7, 6],
       [8, 4, 2, 6]])

df_arr2d = pd.DataFrame(arr2d)
df_arr2d

	0	1	2	3
0	6	3	0	4
1	2	5	5	0
2	3	7	6	5
3	8	3	7	6
4	8	4	2	6

cols = ['A','B','C','D']
idx = ['a','b','c','d','e']

df_arr2d = pd.DataFrame(arr2d,index=pd.Index(idx),columns=cols)
# 或
# df_arr2d = pd.DataFrame(arr2d,index=idx,columns=cols)
df_arr2d

	A	B	C	D
a	6	3	0	4
b	2	5	5	0
c	3	7	6	5
d	8	3	7	6
e	8	4	2	6

由数组、列表或元组组成的字典

所有序列长度必须相同

数组组成的字典

import random

arr1 = np.random.randint(0,9,size=10)
arr2 = np.random.randint(0,9,size=10)
arr3 = np.random.randint(0,9,size=10)

dic_array = {'A':arr1,'B':arr2,'C':arr3}
DF_dic_array = pd.DataFrame(dic_array)
DF_dic_array

	A	B	C
0	1	4	2
1	2	2	1
2	0	2	2
3	6	3	1
4	2	4	4
5	3	4	5
6	3	3	4
7	4	8	8
8	6	4	7
9	3	4	2

由列表组成的字典

L1 = [1,2,3,4,5]
L2 = [6,7,8,9,0]
L3 = ['a','b','c','d','e']
dict_list = {'A':L1,'B':L2,'C':L3}

DF_dict_list = pd.DataFrame(dict_list)
DF_dict_list

	A	B	C
0	1	6	a
1	2	7	b
2	3	8	c
3	4	9	d
4	5	0	e

由元组组成的字典

t1 = (1,2,3,4,5)
t2 = (6,7,8,9,0)
t3 = ('a','b','c','d','e')
dict_tuple = {'A':t1, 'B':t2, 'C':t3}

DF_dict_t1 = pd.DataFrame(dict_tuple)
DF_dict_t1

	A	B	C
0	1	6	a
1	2	7	b
2	3	8	c
3	4	9	d
4	5	0	e

NumPy的结构化/记录数组

由Series组成的字典

s1 = pd.Series([1,2,3,4,5])
s2 = pd.Series(['a','b','c','d','e','f'])
s3 = pd.Series([6,7,8,9,0])

dict_s = {'A':s1,'B':s2,'C':s3}
DF_dict_s = pd.DataFrame(dict_s)
DF_dict_s

	A	B	C
0	1.0	a	6.0
1	2.0	b	7.0
2	3.0	c	8.0
3	4.0	d	9.0
4	5.0	e	0.0
5	NaN	f	NaN

字典组成的字典

外层字典的键作为列，内层键则作为行索引
内层字典会被合并，排序，形成最后的索引

d1 = {2004:6,2001:7,2002:8,2005:9,2003:0}
d2 = {2001:1,2002:2,2003:3,2004:4,2005:5}
d3 = {2005:'a',2001:'b',2002:'d',2004:'c',2003:'e'}

dict_d = {'A':d1,'B':d2,'C':d3}

DF_dict_d = pd.DataFrame(dict_d)
DF_dict_d

	A	B	C
2001	7	1	b
2002	8	2	d
2003	0	3	e
2004	6	4	c
2005	9	5	a

字典或Series的列表

各项成为DataFrame的一行。
字典键或Series索引的并集成为DF的列标

字典的列表

d1 = {2001:1,2002:2,2003:3,2004:4,2005:5}
d2 = {2003:6,2002:7,2001:8,2004:9,2005:0}
d3 = {2001:'a',2002:'b',2003:'c',2004:'d',2005:'e'}

List_d = [d1,d2,d3]
List_d

[{2001: 1, 2002: 2, 2003: 3, 2004: 4, 2005: 5},
 {2003: 6, 2002: 7, 2001: 8, 2004: 9, 2005: 0},
 {2001: 'a', 2002: 'b', 2003: 'c', 2004: 'd', 2005: 'e'}]

DF_list_d = pd.DataFrame(List_d)
DF_list_d

	2001	2002	2003	2004	2005
0	1	2	3	4	5
1	8	7	6	9	0
2	a	b	c	d	e

Series组成的列表

s1 = pd.Series(np.random.randint(0,4,size=(5)))
s2 = pd.Series(np.random.randint(5,9,size=(5)))
s3 = pd.Series(['a','b','c','d','e'])

List_s = [s1,s2,s3]

DF_list_s = pd.DataFrame(List_s)
DF_list_s

	0	1	2	3	4
0	2	0	0	1	1
1	8	5	8	7	8
2	a	b	c	d	e

由列表或元组组成的列表

类似"二维的ndarray"

列表组成的列表

l1 = [1,2,3,4,5]
l2 = [6,7,8,9,0]
l3 = ['a','b','c','d','e']

ll = [l1,l2,l3]

DF_ll = pd.DataFrame(ll)

DF_ll

	0	1	2	3	4
0	1	2	3	4	5
1	6	7	8	9	0
2	a	b	c	d	e

元组组成的列表

t1 = (1,2,3,4,5)
t2 = (6,7,8,9,0)
t3 = ('a','b','c','d','e')

LT = [t1,t2,t3]

DF_LT = pd.DataFrame(LT)
DF_LT

	0	1	2	3	4
0	1	2	3	4	5
1	6	7	8	9	0
2	a	b	c	d	e

另一个DF

arr1 = np.array([[1,2,3,4,5],
                [6,7,8,9,0],
                ['a','b','c','d','e']])
arr1

array([['1', '2', '3', '4', '5'],
       ['6', '7', '8', '9', '0'],
       ['a', 'b', 'c', 'd', 'e']], dtype='<U11')

DF1 = pd.DataFrame(arr1)
DF1

	0	1	2	3	4
0	1	2	3	4	5
1	6	7	8	9	0
2	a	b	c	d	e

DF2 = pd.DataFrame(DF1)
DF2

	0	1	2	3	4
0	1	2	3	4	5
1	6	7	8	9	0
2	a	b	c	d	e

NumPy的MaskedArray

没研究过MaskedArray。。。