pandas如何用几秒钟处理拥有大量行的excel表

最新推荐文章于 2024-09-30 12:14:30 发布

qq_37112115

最新推荐文章于 2024-09-30 12:14:30 发布

阅读量229

点赞数 1

CC 4.0 BY-SA版权

文章标签： pandas python 数据分析 excel

本文链接：https://blog.youkuaiyun.com/qq_37112115/article/details/131344639

如果用pandas对一个拥有十分可观的行的excel表进行数据处理，此时用dataframe进行数据比对是十分不明智的选择，因为dataframe其本身为对象，会占用十分可观的内存空间。如果执行以下典型反例会导致内存越吃越多，最后电脑必然会彻底卡死崩溃。

import pandas as pd

excel_file = pd.ExcelFile('【总】input.xlsx')
df1 = pd.read_excel(excel_file, sheet_name='Sheet1')
df2 = pd.read_excel(excel_file, sheet_name='Sheet2')

i = 0
j = 0
while i < len(df1):
print('第'+str(i)+f'行，进度{round(i/24956*100,1)}%')
while j < len(df2):
if df1.iloc[i, 0] == df2.iloc[j, 0] and df1.iloc[i, 1] == df2.iloc[j, 1] and df1.iloc[i, 5] == df2.iloc[j, 3]:
df1.iloc[i, 14] = df2.iloc[j, 16]
df1.iloc[i, 15] = df2.iloc[j, 17]
df1.iloc[i, 16] = df2.iloc[j, 18]
df1.iloc[i, 17] = df2.iloc[j, 19]
df1.iloc[i, 18] = df2.iloc[j, 20]
df1.iloc[i, 19] = df2.iloc[j, 21]
j += 1
j = 0
i += 1
df1.to_excel('测试.xlsx', index=False)

那么该如何优化上述代码？不妨将dataframe对象彻底转换成列表，然后进行列表与列表之间的数据比对。唯一的缺点是会出来一堆没有任何用处的警告。以下为优化方案。

import pandas as pd

excel_file = pd.ExcelFile('【总】input.xlsx')
df1 = pd.read_excel(excel_file, sheet_name='Sheet1')
df2 = pd.read_excel(excel_file, sheet_name='Sheet2')

# 将df1和df2转换为列表
list1 = df1.values.tolist()
list2 = df2.values.tolist()

# 进行数据对比并更新列表
i = 0
j = 0
while i < len(list1):
    print('第' + str(i) + f'行，进度{round(i / 24956 * 100, 1)}%')
    while j < len(list2):
        if str(list1[i][0]) == str(list2[j][0]) and str(list1[i][1]) == str(list2[j][1]) and str(list1[i][5]) == str(list2[j][3]):
            list1[i][14] = list2[j][16]
            list1[i][15] = list2[j][17]
            list1[i][16] = list2[j][18]
            list1[i][17] = list2[j][19]
            list1[i][18] = list2[j][20]
            list1[i][19] = list2[j][21]
        j += 1
    j = 0
    i += 1

# 将更新后的列表转换回DataFrame
df1_updated = pd.DataFrame(list1, columns=df1.columns)
df1_updated.to_excel('测试.xlsx', index=False)