[Python-Dev] [Python-checkins] r43358 - python/trunk/Modules/itertoolsmodule.c
Tim Peters
tim.peters at gmail.com
Wed Mar 29 01:13:47 CEST 2006
[Thomas Wouters]
> ...
> The cycle this nested generator creates, which is also involved in the test_tee
> leak, is not cleanable by the cycle-gc, and it looks like it hasn't been
> since the yield-expr/coroutine patch was included in the trunk.
That could very well be. Adding finalizers to generators could
legitimately make some cycles "suddenly" uncollectable. There was a
burst of list traffic about this on the 18th and 19th of June, 2005,
under subject:
gcmodule issue w/adding __del__ to generator objects
I starred it in gmail at the time (which is why I was able to find it
again ;-)), but never had time to pay attention.
> I believe the culprit to be this part of that patch:
>
> Index: Modules/gcmodule.c
>===================================================================
> --- Modules/gcmodule.c (revision 39238)
> +++ Modules/gcmodule.c (revision 39239)
> @@ -413,10 +413,8 @@
> assert(delstr != NULL);
> return _PyInstance_Lookup(op, delstr) != NULL;
> }
> - else if (PyType_HasFeature(op->ob_type, Py_TPFLAGS_HEAPTYPE))
> + else
> return op->ob_type->tp_del != NULL;
> - else
> - return 0;
> }
Right, that's the patch that taught gc that generators now have finalizers.
The original code can be read in two ways:
1. As a whole, it was an implementation of:
Only instances of old- or new-style classes can have finalizers.
An instance of an old-style class has a finalizer iff
it has a method named "__del__".
An instance of a new-style class (PyType_HasFeature(op->ob_type,
Py_TPFLAGS_HEAPTYPE) has a finalizer iff its tp_del is non-NULL.
2. Because of the obscure gimmicks that try to cater to using old
binary extension modules across new Python releases without
recompiling, there's no guarantee that the tp_del slot even exists,
and therefore we don't try to access tp_del at all unless
PyType_HasFeature says the type object was compiled recently
enough so that we know tp_del does exist.
When generators grew finalizers, the "Only instances of ... classes
can have finalizers" part of #1 became wrong, and I expect that's why
Phillip removed the conditional. It was the right thing to do from
that point of view.
I don't understand the #2 gimmicks well enough to guess whether it was
actually safe to remove all guards before trying to access tp_del (I
run on Windows, and compiled extensions can never be reused across
Python minor releases on Windows -- if a problem was introduced here,
I'll never see it).
> Now, reverting that part of the patch in revision 39239
There may be a reason to change that patch (due to #2 above), but
generators do have finalizers now and the #1 part must not be reverted
(although it may be fruitful to complicate it ;-)).
> triggers an assertion failure, but removing the assertion makes it work right;
No, it's not right if has_finalizer(g) returns 0 for all generators g.
> the above nested-generator cycle gets cleaned up again. Later in the trunk, the
> assertion was changed anyway, and it no longer fails if I just revert the
> gcmodule change. Of course, the reason the cycle is uncleanable is because
> generators grew a tp_del method, and that throws cycle-gc in a hissy fit;
If by "hissy fit" you mean "gcmodule properly moves generators to the
list of objects with finalizers, thereby preventing catastrophes",
right, that's an intended hissy fit ;-)
> reverting the above patch just makes cycle-gc ignore the tp_del of
> heaptypes. I don't know enough about the cycle-gc to say whether that's good
> or bad,
Ignoring all generators' tp_del would be disastrous (opens pure-Python
code to segfaults, etc).
> but not having generators be cleanable is not very good.
Not disastrous, though.
> For what it's worth, reverting the gcmodule patch doesn't seem to break any
> tests.
I believe that. gc disasters are typically very difficult to provoke
--, the first time :-)
> However, fixing _both_ those things (itertools.tee lack of GC, and GC's
> inability to clean generators) *still does not fix the 254 refleaks in
> test_generators*. When I backport the itertools.tee-GC changes to r39238
> (one checkin before coroutines), test_generators doesn't leak, neither the
> r39238 version of test_generators, nor the trunk version. One revision
> later, r39239, either test_generators leaks 15 references. 'Fixing'
> gcmodule, which fixes the nested-generator leak, does not change the number
> of leaks. And, as you all may be aware of, in the trunk, test_generators
> leaks 254 references; this is also the case with the gcmodule fix. So
> there's more huntin' afoot.
>
> In the mean time, if people knowledgeable about the cycle-GC care to comment
> about the gcmodule change wrt. heaptypes, please do.
Did the best I could above, short of explaining exactly how failing to
identify an object with a finalizer can lead to disaster. Short
course: gc obviously needs to know which objects are and are not
trash. If a finalizer is associated with an object in cyclic trash,
then trash objects are potentially visible _from_ the finalizer, and
therefore can be resurrected by running the finalizer. gc must
therefore identify all trash objects reachable from trash objects with
finalizers, because running finalizers _may_ make all such objects
"not trash" again. Getting any part of that wrong can lead to calling
tp_clear on an object that a finalizer actually resurrects, leading to
symptoms from "hey, all the attributes on my object suddenly vanished
by magic, for no reason at all" to segfaults.
More information about the Python-Dev
mailing list