The difference is that set_value is returning an object, while the assignment operator assigns the value into the existing DataFrame object.
after calling set_value you will potentially have two DataFrame objects (this does not necessarily mean you’ll have two copies of the data, as DataFrame objects can “reference” one another) while the assignment operator will change data in the single DataFrame object.
It appears to be faster to use the set_value, as it is probably optimized for that use-case, while the assignment approach will generate intermediate slices of the data:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df=pd.DataFrame(np.random.rand(100,100))
In [4]: %timeit df[10][10]=7
The slowest run took 6.43 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 89.5 µs per loop
In [5]: %timeit df.set_value(10,10,11)
The slowest run took 10.89 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 3.94 µs per loop
the result of set_value may be a copy, but the documentation is not really clear (to me) on this:
Returns:
frame : DataFrame
If label pair is contained, will be reference to calling DataFrame, otherwise a new object
本文探讨了在Pandas中使用set_value方法与直接赋值操作的区别。set_value返回一个对象,可能产生新的DataFrame实例;而赋值操作会直接修改现有DataFrame。测试表明set_value在速度上更优,但结果可能是数据的拷贝。
1910

被折叠的 条评论
为什么被折叠?



