[Python-Dev] Linus on garbage collection

Stefan Behnel stefan_ml at behnel.de
Fri May 6 20:06:12 CEST 2011

Michael Foord, 06.05.2011 19:06:
> On 06/05/2011 17:51, Stefan Behnel wrote:
>> Mark Shannon, 06.05.2011 18:33:
>>> skip at pobox.com wrote:
>>>> Antoine> Since we're sharing links, here's Matt Mackall's take:
>>>> Antoine>
>>>> http://www.selenic.com/pipermail/mercurial-devel/2011-May/031055.html
>>>>> From that note:
>>>> 1: You can't have meaningful destructors, because when destruction
>>>> happens is undefined. And going-out-of-scope destructors are extremely
>>>> useful. Python is already a rather broken in this regard, so feel free
>>>> to ignore this point.
>>>> Given the presence of cyclic data I don't see how reference counting or
>>>> garbage collection win. Ignoring the fact that in a pure reference counted
>>>> system you won't even consider cycles for reclmation, would both RC and GC
>>>> have to punt because they can't tell which object's destructor to call
>>>> first?
>>> It doesn't matter which is called first.
>> May I quote you on that one the next time my software crashes?
> Arbitrarily breaking cycles *could* cause a problem if a destructor
> attempts to access an already collected object.

This is more real than the "could" suggests. Remember that CPython includes 
a lot of C code, and is commonly used to interface with C libraries. While 
you will simply get an exception when cycles are broken in Python code, 
cycles that involve C code can suffer quite badly from this problem.

There was a bug in the lxml.etree XML library a while ago that could let it 
crash hard when its Element objects participated in a reference cycle. It's 
based on libxml2, so there's an underlying C tree that potentially involves 
disconnected subtrees, and a Python space representation using Element 
proxies, with at least one Element for each disconnected subtree.

Basically, Elements reference their Document (not the other way round) even 
if they are disconnected from the main C document tree. The Document needs 
to do some final cleanup in the end, whereas the Elements require the 
Document to be alive to do their own subtree cleanup, if only to know what 
exactly to clean up, as the subtrees share some C state through the 
document. Now, if any of the Elements ends up in a reference cycle for some 
reason, the GC will throw its dices and may decide to call the Document 
destructor first. Then the Element destructors are bound to crash, trying 
to access dead memory of the Document.

This was easy to fix in CPython's refcounting environment. A double INCREF 
on the Document for each Element does the trick, as it effectively removes 
the Document from the collectable cycle and lets the Element destructors 
decide when to let the Document refcount go down to 0. A fix in a pure GC 
system is substantially harder to make efficient.


More information about the Python-Dev mailing list