对象循环引用,gc
Each instance of a class in CPython created using the class syntax is involved in a cyclic GC mechanism. This increases the memory footprint of each instance and can create memory problems in heavily loaded systems.
使用类语法创建的CPython中的每个类实例都涉及循环GC机制。 这会增加每个实例的内存占用量,并可能在负载较重的系统中造成内存问题 。
reference counting mechanism when necessary? 参考计数机制?
Let's analyze one approach based on recordclass library that will help to create classes whose instances will only be deleted using the reference counting mechanism.
让我们分析一种基于记录类库的方法,该方法将有助于创建仅使用引用计数机制删除其实例的类。
Note: this is translation from original post (in russian).
注意:这是原始帖子的翻译(俄语)。
关于CPython中的垃圾收集的一些知识 (A little bit about garbage collection in CPython)
The primary mechanism for garbage collection in Python is reference counting. Each object contains a field that contains the current value of the references to it. An object is destroyed as soon as the value of the reference counter becomes zero. However, it does not allow the disposal of objects that contain cyclic references. For example:
Python中垃圾回收的主要机制是引用计数。 每个对象都包含一个字段,其中包含对该对象的引用的当前值。 只要参考计数器的值变为零,就会破坏对象。 但是,它不允许处理包含循环引用的对象。 例如:
lst = []
lst.append(lst)
del lst
In such cases, after deleting the object, the counter of references to it remains more than zero. To solve this problem, Python has an additional mechanism that tracks objects and breaks loops in the graph of references between objects. There is a good article on how the cyclic garbage collection mechanism works in CPython3 article.
在这种情况下,删除对象后,对该对象的引用计数器仍大于零。 为了解决这个问题,Python提供了一种附加的机制来跟踪对象并打破对象之间引用图中的循环。 有对循环垃圾回收机制在CPython3如何工作的好文章的文章 。
与垃圾回收机制相关的内存开销 (Memory overhead associated with the garbage collection mechanism)
Typically, the garbage collection mechanism does not cause problems. But there is certain overhead associated with it:
通常,垃圾回收机制不会引起问题。 但是有一些相关的开销:
PyGC_Head is added to each instance of the class during memory allocation: at least 24 bytes in Python <= 3.7 and 16 bytes in 3.8 on a 64-bit platform. PyGC_Head在内存分配期间被添加到该类的每个实例:在64位平台上,Python <= 3.7中至少为24个字节,在3.8中为3.8中至少有16个字节。
This can create a memory shortage problem if you run many instances of the same process, in which you need to have at the same time a very large number of objects with a relatively small number of attributes, and the amount of memory is limited.
如果您运行同一进程的许多实例,这可能会造成内存不足的问题,在该实例中,您需要同时拥有数量众多且具有相对较少属性的对象,并且内存量受到限制。
有时是否有可能将自己局限于参考计数的基本机制? (Is it sometimes possible to limit oneself to the basic mechanism of reference counting?)
The garbage collection mechanism may be redundant when the class represents a non-recursive data type. For example, records containing values of a simple type (numbers, strings, date/time). To illustrate, consider a simple class:
当类表示非递归数据类型时,垃圾回收机制可能是多余的。 例如,包含简单类型值(数字,字符串,日期/时间)的记录。 为了说明,考虑一个简单的类:
class Point:
x: int
y: int
If used correctly, reference cycles are not possible. Although in Python, nothing prevents "to shoot yourself in the foot":
如果使用正确,则不可能进行参考循环。 尽管在Python中,没有什么可以阻止“向自己开枪”:
p = Point(0, 0)
p.x = p
That is, if cyclic GC is disabled, then in this case the object will not be disposed of.
也就是说,如果禁用了循环GC,则在这种情况下将不会处理该对象。
However, for the Point class, just could be limited to a reference counting mechanism. Of course, provided that when the program is executed, reference cycles will not be created, that is, the x and y attributes will take only integer values, as was stated when defining the class. But there is no standard way to refuse cyclic GC for user defined class yet.
但是,对于Point类, 可能仅限于引用计数机制。 当然,只要执行程序时就不会创建参考循环,即x和y属性将仅采用整数值,如定义类时所述。 但是,尚无标准方法可以拒绝用户定义的类的循环GC。
Modern CPython is designed so that when defining custom classes in the structure, which is responsible for the type that defines the custom class, the flag Py_TPFLAGS_HAVE_GC is always set. It determines that class instances will be included in the garbage collection mechanism. For all such objects, when created, the header PyGC_Head is added, and they are included in the list of monitored objects. If the flag Py_TPFLAGS_HAVE_GC is not set, then only the basic reference counting mechanism works. However, a single reset of Py_TPFLAGS_HAVE_GC will not work. You will need to make changes to the core CPython responsible for creating and destroying instances. This is still problematic because it is too big a change in the core of CPython.
现代CPython的设计使得在结构中定义自定义类(负责定义自定义类的类型)时, 始终会设置标志Py_TPFLAGS_HAVE_GC 。 它确定类实例将包含在垃圾回收机制中。 对于所有此类对象,在创建时都会添加标头PyGC_Head ,并将它们包含在受监视对象的列表中。 如果未设置标志Py_TPFLAGS_HAVE_GC ,则仅基本参考计数机制起作用。 但是,一次重置Py_TPFLAGS_HAVE_GC将不起作用。 您将需要对负责创建和销毁实例的核心CPython进行更改。 这仍然是有问题的,因为它对CPython核心的更改太大了。
关于一种实施 (About one implementation)
As an example of the implementation of the idea, consider using of base class dataobject from the recordclass project. Using it, you can create classes whose instances do not participate in the mechanism of cyclic GC (Py_TPFLAGS_HAVE_GC is not seted and, accordingly, there is no additional header PyGC_Head). They have exactly the same structure in memory as class instances with __slots__, but without PyGC_Head:
作为实现此想法的示例,请考虑使用recordclass项目中的基类dataobject 。 使用它,您可以创建其实例不参与循环GC机制的类(未设置Py_TPFLAGS_HAVE_GC ,因此,没有其他头PyGC_Head )。 它们在内存中的结构与带有__slots__的类实例的结构完全相同,但没有PyGC_Head :
from recordclass import dataobject
class Point(dataobject):
x:int
y:int
>>> p = Point(1,2)
>>> print(p.__sizeof__(), sys.getsizeof(p))
32 32
For comparison, we give a similar class with __slots__:
为了进行比较,我们使用__slots__给出了一个类似的类:
class Point:
__slots__ = 'x', 'y'
x:int
y:int
>>> p = Point(1,2)
>>> print(p.__sizeof__(), sys.getsizeof(p)) # this is in python 3.7
32 64
The size difference is exactly the size of the PyGC_Head header. For instances with several attributes, such an increase in the size of its memory footprint may be significant. For instances of the Point class, addingPyGC_Head results in a 2-fold increase in its size.
大小差异恰好是PyGC_Head标头的大小。 对于具有多个属性的实例,其内存占用空间大小的这种增加可能很重要。 对于Point类的实例,添加PyGC_Head导致其大小增加2倍。
To achieve this effect, a special metaclass datatype is used, which provides the setting of subclasses of dataobject. As a result of the configuration, the flag Py_TPFLAGS_HAVE_GC is reset, the base instance size tp_basicsize increases by the amount necessary to store additional slots for fields. The corresponding field names are listed when the class is declared (the class Point has two of them: x and y). The datatype metaclass also provides setting the values of the slots tp_alloc, tp_new, tp_dealloc, tp_free, which implement the correct algorithms for creating and destroying instances in memory. By default, instances lack __weakref__ and __dict__ (as with class instances with __slots__).
为了实现此效果,使用了特殊的元类datatype ,该datatype提供dataobject的子类的dataobject 。 作为配置的结果,标志Py_TPFLAGS_HAVE_GC被重置,基本实例大小tp_basicsize增加了存储字段的其他插槽所需的数量。 声明该类时,将列出相应的字段名称(类Point具有两个名称: x和y )。 datatype元类还提供设置插槽tp_alloc , tp_new , tp_dealloc , tp_free的值 ,这些值实现了用于创建和销毁内存中实例的正确算法。 默认情况下,实例缺乏__weakref__和__dict__ (如与类的实例__slots__ )。
结论 (Conclusion)
As one could see, in CPython, if necessary, it is possible to disable the mechanism of cyclic garbage collection for a particular class, when there is confidence that its instances will not form cyclic references. This will also reduce the size of each instance in memory by the size of the PyGC_Head header.
可以看到,在CPython中,如果确信其实例不会形成循环引用,则有可能在特定类中禁用循环垃圾收集机制。 这也会通过PyGC_Head标头的大小来减少内存中每个实例的大小。
In the next article we will try to demonstrate ability to reduce memory usage using classes based on dataobject.
在下一篇文章中,我们将尝试演示使用基于dataobject的类来减少内存使用的能力。
对象循环引用,gc
169万+

被折叠的 条评论
为什么被折叠?



