[Python-Dev] [Python-checkins] r43358 - python/trunk/Modules/itertoolsmodule.c

Tim Peters tim.peters at gmail.com
Wed Mar 29 01:13:47 CEST 2006

[Thomas Wouters]
> ...
> The cycle this nested generator creates, which is also involved in the test_tee
> leak, is not cleanable by the cycle-gc, and it looks like it hasn't been
> since the yield-expr/coroutine patch was included in the trunk.

That could very well be.  Adding finalizers to generators could
legitimately make some cycles "suddenly" uncollectable.  There was a
burst of list traffic about this on the 18th and 19th of June, 2005,
under subject:

    gcmodule issue w/adding __del__ to generator objects

I starred it in gmail at the time (which is why I was able to find it
again ;-)), but never had time to pay attention.

> I believe the culprit to be this part of that patch:
> Index: Modules/gcmodule.c
> --- Modules/gcmodule.c  (revision 39238)
> +++ Modules/gcmodule.c  (revision 39239)
> @@ -413,10 +413,8 @@
>                 assert(delstr != NULL);
>                 return _PyInstance_Lookup(op, delstr) != NULL;
>         }
> -       else if (PyType_HasFeature(op->ob_type, Py_TPFLAGS_HEAPTYPE))
> +       else
>                 return op->ob_type->tp_del != NULL;
> -       else
> -               return 0;
>  }

Right, that's the patch that taught gc that generators now have finalizers.

The original code can be read in two ways:

1. As a whole, it was an implementation of:

       Only instances of old- or new-style classes can have finalizers.
       An instance of an old-style class has a finalizer iff
       it has a method named "__del__".
       An instance of a new-style class (PyType_HasFeature(op->ob_type,
       Py_TPFLAGS_HEAPTYPE) has a finalizer iff its tp_del is non-NULL.

2. Because of the obscure gimmicks that try to cater to using old
    binary extension modules across new Python releases without
    recompiling, there's no guarantee that the tp_del slot even exists,
    and therefore we don't try to access tp_del at all unless
    PyType_HasFeature says the type object was compiled recently
    enough so that we know tp_del does exist.

When generators grew finalizers, the "Only instances of ... classes
can have finalizers" part of #1 became wrong, and I expect that's why
Phillip removed the conditional.  It was the right thing to do from
that point of view.

I don't understand the #2 gimmicks well enough to guess whether it was
actually safe to remove all guards before trying to access tp_del (I
run on Windows, and compiled extensions can never be reused across
Python minor releases on Windows -- if a problem was introduced here,
I'll never see it).

> Now, reverting that part of the patch in revision 39239

There may be a reason to change that patch (due to #2 above), but
generators do have finalizers now and the #1 part must not be reverted
(although it may be fruitful to complicate it ;-)).

> triggers an assertion failure, but removing the assertion makes it work right;

No, it's not right if has_finalizer(g) returns 0 for all generators g.

> the above nested-generator cycle gets cleaned up again. Later in the trunk, the
> assertion was changed anyway, and it no longer fails if I just revert the
> gcmodule change. Of course, the reason the cycle is uncleanable is because
> generators grew a tp_del method, and that throws cycle-gc in a hissy fit;

If by "hissy fit" you mean "gcmodule properly moves generators to the
list of objects with finalizers, thereby preventing catastrophes",
right, that's an intended hissy fit ;-)

> reverting the above patch just makes cycle-gc ignore the tp_del of
> heaptypes. I don't know enough about the cycle-gc to say whether that's good
> or bad,

Ignoring all generators' tp_del would be disastrous (opens pure-Python
code to segfaults, etc).

> but not having generators be cleanable is not very good.

Not disastrous, though.

> For what it's worth, reverting the gcmodule patch doesn't seem to break any
> tests.

I believe that.  gc disasters are typically very difficult to provoke
--, the first time :-)

> However, fixing _both_ those things (itertools.tee lack of GC, and GC's
> inability to clean generators) *still does not fix the 254 refleaks in
> test_generators*. When I backport the itertools.tee-GC changes to r39238
> (one checkin before coroutines), test_generators doesn't leak, neither the
> r39238 version of test_generators, nor the trunk version. One revision
> later, r39239, either test_generators leaks 15 references. 'Fixing'
> gcmodule, which fixes the nested-generator leak, does not change the number
> of leaks. And, as you all may be aware of, in the trunk, test_generators
> leaks 254 references; this is also the case with the gcmodule fix. So
> there's more huntin' afoot.
> In the mean time, if people knowledgeable about the cycle-GC care to comment
> about the gcmodule change wrt. heaptypes, please do.

Did the best I could above, short of explaining exactly how failing to
identify an object with a finalizer can lead to disaster.  Short
course:  gc obviously needs to know which objects are and are not
trash.  If a finalizer is associated with an object in cyclic trash,
then trash objects are potentially visible _from_ the finalizer, and
therefore can be resurrected by running the finalizer.  gc must
therefore identify all trash objects reachable from trash objects with
finalizers, because running finalizers _may_ make all such objects
"not trash" again.  Getting any part of that wrong can lead to calling
tp_clear on an object that a finalizer actually resurrects, leading to
symptoms from "hey, all the attributes on my object suddenly vanished
by magic, for no reason at all" to segfaults.

More information about the Python-Dev mailing list