从分析reshape后的对象占用内存大小说起

本文探讨了在Python中对对象进行reshape操作后内存占用的变化,以及sys.getsizeof在不同情况下的表现。文章还详细解释了视图(view)、浅拷贝(shallow copy)和深拷贝(deep copy)的区别,指出简单赋值和传递可变参数时不创建副本。此外,文章提到了numpy数组的size计算和判断是否为视图的方法,以及在特定上下文中如何区分slice和fancy index的操作效果。

起因

有一段代码,其中需用分析对一个numpy.ndarray对象进行reshape操作前后,其占用内存的变化。使用了sys.getsizeof(), 但结果异常小,并且为固定值。

sys.gesizeof

sys.getsizeof 的文档给出:

Return the size of an object in bytes. The object can be any type of
object. All built-in objects will return correct results, but this
does not have to hold true for third-party extensions as it is
implementation specific.

Only the memory consumption directly attributed to the object is
accounted for, not the memory consumption of objects it refers to.

即:返回以字节计数的对象的大小。这个对象可以是任意类型的对象。所有Python的 built-in 对象将返回正确的结果,但是对于第三方扩展(的对象)就不能保证了,因为这是由具体实现决定的。

So only very primitive types in built-in objects are you ever really going to get accurate results. Even for built-in container types, you usually need to use some sort of recursive function to find the “total” size of the container (list, dictionary, etc). Keep in mind, though, that a python list is really just a re-sizable array of pointers, so in a sense, it is an accurate number. 1

还有shallow-copy 和 deep-copy 牵扯在里面

View/shallow-copy & Deep-copy

When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is often a source of confusion for beginners. There are three cases:

No Copy at All

简单的赋值和传递可变参数不会发生copy

Simple assignments make no copy of array objects or of their data.

>>> a = np.arange(12)
>>> b = a            # no new object is created
>>> b is a           # a and b are two names for the same ndarray object
True
>>> b.shape = 3,4    # changes the shape of a
>>> a.shape
(3, 4)

Python passes mutable objects as references, so function calls make no copy.

>>> def f(x):
...     print(id(x))
...
>>> id(a)                           # id is a unique identifier of an object
148293216
>>> f(a)
148293216
view or Shallow Copy

为了节省存储,shallow copy产生的新对象会引用原始数据内容。
Different array objects can share the same data. The view method creates a new array object that looks at the same data.

>>> c = a.view()
>>> c is a
False
>>> c.base is a                        # c is a view of the data owned by a
True
>>> c.flags.owndata
False
>>>
>>> c.shape = 2,6                      # a's shape doesn't change
>>> a.shape
(3, 4)
>>> c[0,4] = 1234                      # a's data changes
>>> a
array([[   0,    1,    2,    3],
       [1234,    5,    6,    7],
       [   8,    9,   10,   11]])

Slicing an array returns a view of it:

>>>
>>> s = a[ : , 1:3]     # spaces added for clarity; could also be written "s = a[:,1:3]"
>>> s[:] = 10           # s[:] is a view of s. Note the difference between s=10 and s[:]=10
>>> a
array([[   0,   10,   10,    3],
       [1234,   10,   10,    7],
       [   8,   10,   10,   11]])
deep Copy

deep copy完全复制出一份独立的数据。
The copy method makes a complete copy of the array and its data.

>>>
>>> d = a.copy()                          # a new array object with new data is created
>>> d is a
False
>>> d.base is a                           # d doesn't share anything with a
False
>>> d[0,0] = 9999
>>> a
array([[   0,   10,   10,    3],
       [1234,   10,   10,    7],
       [   8,   10,   10,   11]])

Extra Points

  • 使用array.nbytes()来计算numpy.ndarray大小
  • 判断是view还是真正拥有数据的对象:
# .base()
a = np.arange(50)
b = a.reshape((5, 10))
print (b.base is a)

# may_share_memory()
print (np.may_share_memory(a, b))
  • 如果赋值的左值带indices,那么不会产生view或者copy
    The rule of thumb here can be: in the context of lvalue indexing (i.e. the indices are placed in the left hand side value of an assignment), no view or copy of the array is created (because there is no need to). However, with regular values, the above rules for creating views does apply.

关于 Final Exercise in 3

>>> a = numpy.arange(12).reshape(3,4)
>>> ifancy = [0,2]
>>> islice = slice(0,3,2)
>>> a[islice, :][:, ifancy] = 100
>>> a
array([[100,   1, 100,   3],
       [  4,   5,   6,   7],
       [100,   9, 100,  11]])
       
>>> a = numpy.arange(12).reshape(3,4)
>>> ifancy = [0,2]
>>> islice = slice(0,3,2)
>>> a[ifancy, :][:, islice] = 100  # note that ifancy and islice are interchanged here
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

产生不同的原因在于a[x, :][:, y]这里,可以看作

tmp = a[x, :]
tmp{;, y] = t

由于slice是一种view,共享数据,所以被改变;而fancy index 返回的是copy,与原来的数据独立,所以不改变。这里就是区分slice和fancy index

Refer

  1. https://stackoverflow.com/questions/38113549/python-sys-getsizeof-vaue-of-numpy-ndarray-seems-too-small
  2. https://stackoverflow.com/questions/11784329/python-memory-usage-of-numpy-arrays
  3. https://docs.scipy.org/doc/numpy/user/quickstart.html#copies-and-views
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值