SettingWithCopyWarning 解决方案

最新推荐文章于 2024-07-18 10:34:39 发布

原创最新推荐文章于 2024-07-18 10:34:39 发布 · 1.8k 阅读

0 ·

CC 4.0 BY-SA版权

python数据处理专栏收录该内容

4 篇文章

订阅专栏

本文通过实例演示如何在Pandas中避免SettingWithCopyWarning警告。提供了两种有效的方法，包括使用布尔索引直接更新DataFrame和利用loc属性进行条件赋值。

SettingWithCopyWarning 解决方案

问题场景：我在读取csv文件之后，因为要新增一个特征列并根据已有特征修改新增列的值，结果在修改的时候就碰到了SettingWithCopyWarning这个警告，花了很长时间才解决这个问题。

一个简易版的范例

import pandas as pd
import numpy as np

aa = np.array([1, 0, 1, 0])
bb = pd.DataFrame(aa.T, columns=['one'])
bb['two'] = 0print(bb)
1
2

output[]:
   one  two
0    1    0
1    0    0
2    1    0
3    0    01
2
3
4
5
6
7
8
9

按条件修改新列再输出就报错了：

for i in range(bb.shape[0]):
    if bb['one'][i] == 0:
        bb['two'][i] = 1
print(bb)

output[]:
C:/PycharmProjects/NaiveBayesProduct/pandas/try_index.py:22: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  bb['two'][i] = 1
   one  two
0    1    0
1    0    1
2    1    0
3    0    11
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

这个问题怎么解决呢?

方法一：

c = bb['one']==0
bb.loc[c,'one']=1
print(bb)

   one  two
0    1    0
1    1    1
2    1    0
3    1    1

c的类型是series,为什么这种情况下，bb.loc就不再是副本了？这个问题我也没想清楚？如有网友清楚请告诉我。

方法二：


for i in range(bb.shape[0]):
    if bb['one'][i] == 0:
        bb.loc['one',i] = 1
print(bb)

或者

for i in range(bb.shape[0]):
#    if bb.loc['one',i] == 0:  #KeyError: 'the label [one] is not in the [index]'
#    if bb.loc[i,'one'] == 0:  #可行
    if bb.loc[i]['one'] == 0:  # 可行
        #bb.loc['one',i] = 1  #可行
        bb.loc[i,'one'] = 1

试验了一下标着“可行“的都行，唯一一个疑惑是倒数第二行

bb.loc['one',i]=1可行，为什么第二行的bb.loc['one',i] == 0不可行？

最佳方法还是方法一，尤其适合在行数比较多，条件比较复杂的情况下。