起因
有一段代码,其中需用分析对一个numpy.ndarray
对象进行reshape操作前后,其占用内存的变化。使用了sys.getsizeof()
, 但结果异常小,并且为固定值。
sys.gesizeof
sys.getsizeof 的文档给出:
Return the size of an object in bytes. The object can be any type of
object. All built-in objects will return correct results, but this
does not have to hold true for third-party extensions as it is
implementation specific.Only the memory consumption directly attributed to the object is
accounted for, not the memory consumption of objects it refers to.
即:返回以字节计数的对象的大小。这个对象可以是任意类型的对象。所有Python的 built-in 对象将返回正确的结果,但是对于第三方扩展(的对象)就不能保证了,因为这是由具体实现决定的。
So only very primitive types in built-in objects are you ever really going to get accurate results. Even for built-in container types, you usually need to use some sort of recursive function to find the “total” size of the container (list, dictionary, etc). Keep in mind, though, that a python list is really just a re-sizable array of pointers, so in a sense, it is an accurate number. 1
还有shallow-copy 和 deep-copy 牵扯在里面
View/shallow-copy & Deep-copy
When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is often a source of confusion for beginners. There are three cases:
No Copy at All
简单的赋值和传递可变参数不会发生copy
Simple assignments make no copy of array objects or of their data.
>>> a = np.arange(12)
>>> b = a # no new object is created
>>> b is a # a and b are two names for the same ndarray object
True
>>> b.shape = 3,4 # changes the shape of a
>>> a.shape
(3, 4)
Python passes mutable objects as references, so function calls make no copy.
>>> def f(x):
... print(id(x))
...
>>> id(a) # id is a unique identifier of an object
148293216
>>> f(a)
148293216
view or Shallow Copy
为了节省存储,shallow copy产生的新对象会引用原始数据内容。
Different array objects can share the same data. The view method creates a new array object that looks at the same data.
>>> c = a.view()
>>> c is a
False
>>> c.base is a # c is a view of the data owned by a
True
>>> c.flags.owndata
False
>>>
>>> c.shape = 2,6 # a's shape doesn't change
>>> a.shape
(3, 4)
>>> c[0,4] = 1234 # a's data changes
>>> a
array([[ 0, 1, 2, 3],
[1234, 5, 6, 7],
[ 8, 9, 10, 11]])
Slicing an array returns a view of it:
>>>
>>> s = a[ : , 1:3] # spaces added for clarity; could also be written "s = a[:,1:3]"
>>> s[:] = 10 # s[:] is a view of s. Note the difference between s=10 and s[:]=10
>>> a
array([[ 0, 10, 10, 3],
[1234, 10, 10, 7],
[ 8, 10, 10, 11]])
deep Copy
deep copy完全复制出一份独立的数据。
The copy method makes a complete copy of the array and its data.
>>>
>>> d = a.copy() # a new array object with new data is created
>>> d is a
False
>>> d.base is a # d doesn't share anything with a
False
>>> d[0,0] = 9999
>>> a
array([[ 0, 10, 10, 3],
[1234, 10, 10, 7],
[ 8, 10, 10, 11]])
Extra Points
- 使用
array.nbytes()
来计算numpy.ndarray大小 - 判断是view还是真正拥有数据的对象:
# .base()
a = np.arange(50)
b = a.reshape((5, 10))
print (b.base is a)
# may_share_memory()
print (np.may_share_memory(a, b))
- 如果赋值的左值带indices,那么不会产生view或者copy
The rule of thumb here can be: in the context of lvalue indexing (i.e. the indices are placed in the left hand side value of an assignment), no view or copy of the array is created (because there is no need to). However, with regular values, the above rules for creating views does apply.
关于 Final Exercise in 3
>>> a = numpy.arange(12).reshape(3,4)
>>> ifancy = [0,2]
>>> islice = slice(0,3,2)
>>> a[islice, :][:, ifancy] = 100
>>> a
array([[100, 1, 100, 3],
[ 4, 5, 6, 7],
[100, 9, 100, 11]])
>>> a = numpy.arange(12).reshape(3,4)
>>> ifancy = [0,2]
>>> islice = slice(0,3,2)
>>> a[ifancy, :][:, islice] = 100 # note that ifancy and islice are interchanged here
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
产生不同的原因在于a[x, :][:, y]
这里,可以看作
tmp = a[x, :]
tmp{;, y] = t
由于slice是一种view,共享数据,所以被改变;而fancy index 返回的是copy,与原来的数据独立,所以不改变。这里就是区分slice和fancy index