python dataframe增加一行_python - 在pandas.DataFrame中添加一行-优快云博客

这篇博客讨论了如何在Python的Pandas库中向DataFrame添加行。作者指出，虽然Pandas通常用于处理完全填充的数据，但在某些情况下可能需要逐个添加行。文章列举了多种方法，包括使用`set_value`、`loc`、`append`、`concat`以及预先分配空间等。通过示例代码展示了如何有效地向DataFrame添加行，并强调了性能考虑，尤其是当处理大量数据时。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

python - 在pandas.DataFrame中添加一行

据我所知，pandas旨在加载完全填充的DataFrame，但我需要创建一个空的DataFrame，然后逐个添加行。做这个的最好方式是什么？

我成功创建了一个空的DataFrame：

res = DataFrame(columns=('lib', 'qty1', 'qty2'))

然后我可以添加一个新行并填充一个字段：

res = res.set_value(len(res), 'qty1', 10.0)

它工作但似乎很奇怪： - /(它添加字符串值失败)

如何向我的DataFrame添加新行(具有不同的列类型)？

PhE asked 2019-01-25T09:34:12Z

18个解决方案

305 votes

@ Nasser回答的例子：

>>> import pandas as pd

>>> import numpy as np

>>> df = pd.DataFrame(columns=['lib', 'qty1', 'qty2'])

>>> for i in range(5):

>>> df.loc[i] = [np.random.randint(-1,1) for n in range(3)]

>>>

>>> print(df)

lib qty1 qty2

0 0 0 -1

1 -1 -1 1

2 1 -1 1

3 0 0 0

4 1 -1 -1

[5 rows x 3 columns]

fred answered 2019-01-25T09:34:24Z

231 votes

您可以使用pandas.concat()或DataFrame.append().有关详细信息和示例，请参阅合并，连接和连接。

NPE answered 2019-01-25T09:34:48Z

229 votes

如果您可以预先获取数据帧的所有数据，则可以采用比附加到数据框更快的方法：

创建一个字典列表，其中每个字典对应一个输入数据行。

从此列表创建数据框。

我有一个类似的任务，逐行追加数据框需要30分钟，并在几秒钟内完成一个字典列表中的数据框。

rows_list = []

for row in input_rows:

dict1 = {}

# get input row in dictionary format

# key = col_name

dict1.update(blah..)

rows_list.append(dict1)

df = pd.DataFrame(rows_list)

ShikharDua answered 2019-01-25T09:35:42Z

69 votes

如果您事先知道条目数，则应通过提供索引来预先分配空间(从不同答案中获取数据示例)：

import pandas as pd

import numpy as np

# we know we're gonna have 5 rows of data

numberOfRows = 5

# create dataframe

df = pd.DataFrame(index=np.arange(0, numberOfRows), columns=('lib', 'qty1', 'qty2') )

# now fill it up row by row

for x in np.arange(0, numberOfRows):

#loc or iloc both work here since the index is natural numbers

df.loc[x] = [np.random.randint(-1,1) for n in range(3)]

In[23]: df

Out[23]:

lib qty1 qty2

0 -1 -1 -1

1 0 0 0

2 -1 0 -1

3 0 -1 0

4 -1 0 0

速度比较

In[30]: %timeit tryThis() # function wrapper for this answer

In[31]: %timeit tryOther() # function wrapper without index (see, for example, @fred)

1000 loops, best of 3: 1.23 ms per loop

100 loops, best of 3: 2.31 ms per loop

而且 - 从评论中 - 大小为6000，速度差异变得更大：

增加数组(12)的大小和行数(500) 速度差异更加惊人：313ms vs 2.29s

FooBar answered 2019-01-25T09:36:31Z

58 votes

有效追加请参阅如何向pandas数据框添加额外行和使用放大设置。

在非现有密钥索引数据上通过loc/ix添加行。例如：

In [1]: se = pd.Series([1,2,3])

In [2]: se

Out[2]:

0 1

1 2

2 3

dtype: int64

In [3]: se[5] = 5.

In [4]: se

Out[4]:

0 1.0

1 2.0

2 3.0

5 5.0

dtype: float64

要么：

In [1]: dfi = pd.DataFrame(np.arange(6).reshape(3,2),

.....: columns=['A','B'])

.....:

In [2]: dfi

Out[2]:

A B

0 0 1

1 2 3

2 4 5

In [3]: dfi.loc[:,'C'] = dfi.loc[:,'A']

In [4]: dfi

Out[4]:

A B C

0 0 1 0

1 2 3 2

2 4 5 4

In [5]: dfi.loc[3] = 5

In [6]: dfi

Out[6]:

A B C

0 0 1 0

1 2 3 2

2 4 5 4

3 5 5 5

Nasser Al-Wohaibi answered 2019-01-25T09:37:08Z

51 votes

mycolumns = ['A', 'B']

df = pd.DataFrame(columns=mycolumns)

rows = [[1,2],[3,4],[5,6]]

for row in rows:

df.loc[len(df)] = row

Lydia answered 2019-01-25T09:37:24Z

37 votes

您可以使用ignore_index选项将单行附加为字典。

>>> f = pandas.DataFrame(data = {'Animal':['cow','horse'], 'Color':['blue', 'red']})

>>> f

Animal Color

0 cow blue

1 horse red

>>> f.append({'Animal':'mouse', 'Color':'black'}, ignore_index=True)

Animal Color

0 cow blue

1 horse red

2 mouse black

W.P. McNeill answered 2019-01-25T09:37:54Z

32 votes

为了Pythonic方式，这里添加我的答案：

res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))

res = res.append([{'qty1':10.0}], ignore_index=True)

print(res.head())

lib qty1 qty2

0 NaN 10.0 NaN

hkyi answered 2019-01-25T09:38:18Z

17 votes

已经很久了，但我也遇到了同样的问题。并在这里找到了很多有趣的答案。所以我很困惑使用什么方法。

在向数据帧添加大量行的情况下，我对速度性能感兴趣。所以我尝试了3种最流行的方法并检查了它们的速度。

速度表现

使用.append(NPE的答案)

使用.loc(fred的答案和FooBar的答案)

最后使用dict并创建DataFrame(ShikharDua的答案)

结果(以秒为单位)：

Adding 1000 rows 5000 rows 10000 rows

.append 1.04 4.84 9.56

.loc 1.16 5.59 11.50

dict 0.23 0.26 0.34

所以我通过字典为自己添加了。

码：

import pandas

import numpy

import time

numOfRows = 10000

startTime = time.perf_counter()

df1 = pandas.DataFrame(numpy.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])

for i in range( 1,numOfRows):

df1 = df1.append( dict( (a,numpy.random.randint(100)) for a in ['A','B','C','D','E']), ignore_index=True)

print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))

startTime = time.perf_counter()

df2 = pandas.DataFrame(numpy.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])

for i in range( 1,numOfRows):

df2.loc[df2.index.max()+1] = numpy.random.randint(100, size=(1,5))[0]

print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))

startTime = time.perf_counter()

row_list = []

for i in range (0,5):

row_list.append(dict( (a,numpy.random.randint(100)) for a in ['A','B','C','D','E']))

for i in range( 1,numOfRows):

dict1 = dict( (a,numpy.random.randint(100)) for a in ['A','B','C','D','E'])

row_list.append(dict1)

df3 = pandas.DataFrame(row_list, columns=['A','B','C','D','E'])

print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))

附：我相信，我的认识并不完美，也许有一些优化。

Mikhail_Sam answered 2019-01-25T09:40:30Z

13 votes

这不是OP问题的答案，而是一个玩具示例来说明@ShikharDua的答案，我发现它非常有用。

虽然这个片段是微不足道的，但在实际数据中我有1,000行和多列，我希望能够按不同的列进行分组，然后对多个taget列执行下面的统计。因此，一次一行地构建数据帧的可靠方法是非常方便的。谢谢@ShikharDua！

import pandas as pd

BaseData = pd.DataFrame({ 'Customer' : ['Acme','Mega','Acme','Acme','Mega','Acme'],

'Territory' : ['West','East','South','West','East','South'],

'Product' : ['Econ','Luxe','Econ','Std','Std','Econ']})

BaseData

columns = ['Customer','Num Unique Products', 'List Unique Products']

rows_list=[]

for name, group in BaseData.groupby('Customer'):

RecordtoAdd={} #initialise an empty dict

RecordtoAdd.update({'Customer' : name}) #

RecordtoAdd.update({'Num Unique Products' : len(pd.unique(group['Product']))})

RecordtoAdd.update({'List Unique Products' : pd.unique(group['Product'])})

rows_list.append(RecordtoAdd)

AnalysedData = pd.DataFrame(rows_list)

print('Base Data : \n',BaseData,'\n\n Analysed Data : \n',AnalysedData)

user3250815 answered 2019-01-25T09:41:03Z

7 votes

您还可以构建列表列表并将其转换为数据框 -

import pandas as pd

rows = []

columns = ['i','double','square']

for i in range(6):

row = [i, i*2, i*i]

rows.append(row)

df = pd.DataFrame(rows, columns=columns)

给

i double square

0 0 0 0

1 1 2 1

2 2 4 4

3 3 6 9

4 4 8 16

5 5 10 25

Brian Burns answered 2019-01-25T09:41:45Z

5 votes

创建一个新记录(数据框)并添加到old_data_frame。

传递值列表和相应的列名以创建new_record(data_frame)

new_record = pd.DataFrame([[0,'abcd',0,1,123]],columns=['a','b','c','d','e'])

old_data_frame = pd.concat([old_data_frame,new_record])

Jack Daniel answered 2019-01-25T09:42:17Z

4 votes

想出一个简单而好的方法：

>>> df

A B C

one 1 2 3

>>> df.loc["two"] = [4,5,6]

>>> df

A B C

one 1 2 3

two 4 5 6

Qinsi answered 2019-01-25T09:42:44Z

3 votes

另一种方法(可能不是非常高效)：

# add a row

def add_row(df, row):

colnames = list(df.columns)

ncol = len(colnames)

assert ncol == len(row), "Length of row must be the same as width of DataFrame: %s" % row

return df.append(pd.DataFrame([row], columns=colnames))

您还可以像这样增强DataFrame类：

import pandas as pd

def add_row(self, row):

self.loc[len(self.index)] = row

pd.DataFrame.add_row = add_row

qed answered 2019-01-25T09:43:19Z

1 votes

简单一点。通过将列表作为输入，将作为数据框中的行附加： -

import pandas as pd

res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))

for i in range(5):

res_list = list(map(int, input().split()))

res = res.append(pd.Series(res_list,index=['lib','qty1','qty2']), ignore_index=True)

Vineet Jain answered 2019-01-25T09:43:49Z

1 votes

这是在pandas DataFrame中添加/追加行的方法

def add_row(df, row):

df.loc[-1] = row

df.index = df.index + 1

return df.sort_index()

add_row(df, [1,2,3])

它可用于在空的或填充的pandas DataFrame中插入/追加一行

Shivam Agrawal answered 2019-01-25T09:44:33Z

0 votes

import pandas as pd

t1=pd.DataFrame()

for i in range(len(the number of rows)):

#add rows as columns

t1[i]=list(rows)

t1=t1.transpose()

t1.columns=list(columns)

Vicky answered 2019-01-25T09:44:49Z

-1 votes

这将负责将项添加到空DataFrame。问题是第一个索引的df.index.max()== nan：

df = pd.DataFrame(columns=['timeMS', 'accelX', 'accelY', 'accelZ', 'gyroX', 'gyroY', 'gyroZ'])

df.loc[0 if math.isnan(df.index.max()) else df.index.max() + 1] = [x for x in range(7)]

tomatom answered 2019-01-25T09:45:13Z