PyObject_GC_UnTrack() no longer reliable in 2.7?
Looks like 2.7 changes introduced to exempt dicts and tuples from cyclic gc if they obviously can't be in cycles has some unintended consequences. Specifically, if an extension module calls PyObject_GC_UnTrack() on a dict it _does not want tracked_, Python can start tracking the dict again. I assume this is unintended because (a) the docs weren't changed to warn about this; and, (b) it's wrong ;-) There are two main reasons an extension module may have been calling PyObject_GC_UnTrack(): 1. For correctness, if the extension is playing games with reference counts Python isn't expecting. 2. For speed, if the extension is creating dicts (or tuples) it knows cannot participate in cycles. This came up when Jim Fulton asked me for advice about assertion failures coming out of cyclic gc in a debug build when running ZODB's tests under 2.7. Turned out to be due to the "#1 reason" above: ZODB hand-rolled its own version of weak references long before Python had them, and has a dict mapping strings ("object IDs") to complex objects where the referenced objects' refcounts intentionally do _not_ account for the references due to this dict. This had been working fine for at least 8 years, thanks to calling PyObject_GC_UnTrack() on this dict right after it's created. But under 2.7, new code in Python apparently decides to track this dict again the first time its __setitem__ is called. Cyclic gc then discovers object references due to this dict that aren't accounted for in the referenced objects' refcounts, and dies early on with an assertion failure (which it should do - the refcounts are nuts as far as Python is concerned). Jim wormed around that for now by calling PyObject_GC_UnTrack() again every time the dict's content changes, but that was relatively easy in this case because the dict is an internal implementation detail all accesses to which are under ZODB's control. Best if no changes had been needed. "Better than nothing" if the docs are changed to warn that the effect of calling PyObject_GC_UnTrack() may be undone by Python a nanosecond later ;-)
On Fri, 24 Sep 2010 15:14:32 -0400
Tim Peters
Looks like 2.7 changes introduced to exempt dicts and tuples from cyclic gc if they obviously can't be in cycles has some unintended consequences. Specifically, if an extension module calls PyObject_GC_UnTrack() on a dict it _does not want tracked_, Python can start tracking the dict again.
I assume this is unintended because (a) the docs weren't changed to warn about this; and, (b) it's wrong ;-)
It was indeed unintended. I didn't know people were using PyObject_GC_(Un)Track in other places than constructors and destructors.
There are two main reasons an extension module may have been calling PyObject_GC_UnTrack():
1. For correctness, if the extension is playing games with reference counts Python isn't expecting.
Yikes :)
2. For speed, if the extension is creating dicts (or tuples) it knows cannot participate in cycles.
The optimization is now automated in the simple cases (as you've found out!).
This came up when Jim Fulton asked me for advice about assertion failures coming out of cyclic gc in a debug build when running ZODB's tests under 2.7. Turned out to be due to the "#1 reason" above: ZODB hand-rolled its own version of weak references long before Python had them, and has a dict mapping strings ("object IDs") to complex objects where the referenced objects' refcounts intentionally do _not_ account for the references due to this dict.
Perhaps ZODB should switch to standard weak references these days? (as a bonus, chances are it will be faster)
Best if no changes had been needed. "Better than nothing" if the docs are changed to warn that the effect of calling PyObject_GC_UnTrack() may be undone by Python a nanosecond later ;-)
A doc addition will be enough, hopefully. Regards Antoine.
On Fri, Sep 24, 2010 at 3:36 PM, Antoine Pitrou
On Fri, 24 Sep 2010 15:14:32 -0400 Tim Peters
wrote: Looks like 2.7 changes introduced to exempt dicts and tuples from cyclic gc if they obviously can't be in cycles has some unintended consequences. Specifically, if an extension module calls PyObject_GC_UnTrack() on a dict it _does not want tracked_, Python can start tracking the dict again.
I assume this is unintended because (a) the docs weren't changed to warn about this; and, (b) it's wrong ;-)
It was indeed unintended. I didn't know people were using PyObject_GC_(Un)Track in other places than constructors and destructors.
There are two main reasons an extension module may have been calling PyObject_GC_UnTrack():
1. For correctness, if the extension is playing games with reference counts Python isn't expecting.
Yikes :)
2. For speed, if the extension is creating dicts (or tuples) it knows cannot participate in cycles.
The optimization is now automated in the simple cases (as you've found out!).
This came up when Jim Fulton asked me for advice about assertion failures coming out of cyclic gc in a debug build when running ZODB's tests under 2.7. Turned out to be due to the "#1 reason" above: ZODB hand-rolled its own version of weak references long before Python had them, and has a dict mapping strings ("object IDs") to complex objects where the referenced objects' refcounts intentionally do _not_ account for the references due to this dict.
Perhaps ZODB should switch to standard weak references these days? (as a bonus, chances are it will be faster)
This is the long term plan. Switching is not going to be a small project and not high on the list of priorities. (Actually, ZODB invented its own weakref mechanism after Python had weakrefs, but before weakrefs were subclassable. Using standard weakrefs was deemed too expensive in terms of memory use.) For the record, I don't consider this a Python bug. This corner of ZODB is living on the edge and deserves what it gets. :) I'm just happy the fix was ultimately pretty simple.
Best if no changes had been needed. "Better than nothing" if the docs are changed to warn that the effect of calling PyObject_GC_UnTrack() may be undone by Python a nanosecond later ;-)
A doc addition will be enough, hopefully.
Absolutely. Jim -- Jim Fulton
I assume this is unintended because (a) the docs weren't changed to warn about this; and, (b) it's wrong ;-)
It seems Jim is happy with (or has at least accepted) the behavior change. Would you still like to see it fixed (or, rather, have the 2.6 state restored)? I think it would be possible to have two versions of _PyGC_REFS_UNTRACKED, one being, say, -5. _PyGC_REFS_UNTRACKED_AND_KEEP_IT_THAT_WAY would be what you get when you call PyObject_GC_UnTrack; the code to do automatic tracking/untracking based on contents would use some other new API (which would be non-public in 2.7.x). Regards, Martin
On Fri, Sep 24, 2010 at 4:09 PM, "Martin v. Löwis"
I think it would be possible to have two versions of _PyGC_REFS_UNTRACKED, one being, say, -5. _PyGC_REFS_UNTRACKED_AND_KEEP_IT_THAT_WAY would be what you get when you call PyObject_GC_UnTrack; the code to do automatic tracking/untracking based on contents would use some other new API (which would be non-public in 2.7.x).
Where would the extra state information be stored? (to distinguish untracked and untracked-and-keep-it-that-way) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com/
Am 24.09.2010 23:22, schrieb Daniel Stutzbach:
On Fri, Sep 24, 2010 at 4:09 PM, "Martin v. Löwis"
mailto:martin@v.loewis.de> wrote: I think it would be possible to have two versions of _PyGC_REFS_UNTRACKED, one being, say, -5. _PyGC_REFS_UNTRACKED_AND_KEEP_IT_THAT_WAY would be what you get when you call PyObject_GC_UnTrack; the code to do automatic tracking/untracking based on contents would use some other new API (which would be non-public in 2.7.x).
Where would the extra state information be stored? (to distinguish untracked and untracked-and-keep-it-that-way)
As everything else: in gc_refs. Regards, Martin
[Tim]
I assume this is unintended because (a) the docs weren't changed to warn about this; and, (b) it's wrong ;-)
[Martin v. Löwis]
It seems Jim is happy with (or has at least accepted) the behavior change. Would you still like to see it fixed (or, rather, have the 2.6 state restored)?
"it's wrong ;-)" meant what it said - the track/untrack APIs weren't intended to be hints Python is free to ignore, they were meant to give the user control over whether and when their objects participated in cyclic gc. It's true that their (by far) most common uses are mandatory, to avoid tracking before a new object is sane, and to untrack again before it becomes insane when it's being torn down, but those were not the only intended uses. That said, the optimization 2.7 introduced probably has value that shouldn't be dismissed either. BTW, if it had taken Jim 1000 lines of new code instead of 2 to worm around the regression in ZODB under Python 2.7, I expect he'd be singing a different tune ;-) I view his experience as akin to the canary in the coal mine, albeit likely a mine with very few miners worldwide.
I think it would be possible to have two versions of _PyGC_REFS_UNTRACKED, one being, say, -5. _PyGC_REFS_UNTRACKED_AND_KEEP_IT_THAT_WAY would be what you get when you call PyObject_GC_UnTrack; the code to do automatic tracking/untracking based on contents would use some other new API (which would be non-public in 2.7.x).
Looks like a promising idea! gcmodule.c's IS_TRACKED macro would have to change to check both states, and likewise the debug assert in visit_reachable().
On Sat, Sep 25, 2010 at 9:20 AM, Tim Peters
I think it would be possible to have two versions of _PyGC_REFS_UNTRACKED, one being, say, -5. _PyGC_REFS_UNTRACKED_AND_KEEP_IT_THAT_WAY would be what you get when you call PyObject_GC_UnTrack; the code to do automatic tracking/untracking based on contents would use some other new API (which would be non-public in 2.7.x).
Looks like a promising idea! gcmodule.c's IS_TRACKED macro would have to change to check both states, and likewise the debug assert in visit_reachable().
Given the intent is to restore the 2.6 state, perhaps it is the "UNTRACK_ALLOW_RETRACK" optimisation that should get the new special value? Other than that, MvL's suggestion looks like a good idea. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Am 26.09.2010 12:54, schrieb Nick Coghlan:
On Sat, Sep 25, 2010 at 9:20 AM, Tim Peters
wrote: [MvL] I think it would be possible to have two versions of _PyGC_REFS_UNTRACKED, one being, say, -5. _PyGC_REFS_UNTRACKED_AND_KEEP_IT_THAT_WAY would be what you get when you call PyObject_GC_UnTrack; the code to do automatic tracking/untracking based on contents would use some other new API (which would be non-public in 2.7.x).
Looks like a promising idea! gcmodule.c's IS_TRACKED macro would have to change to check both states, and likewise the debug assert in visit_reachable().
Given the intent is to restore the 2.6 state, perhaps it is the "UNTRACK_ALLOW_RETRACK" optimisation that should get the new special value? Other than that, MvL's suggestion looks like a good idea.
It would work either way, of course. Regards, Martin
participants (6)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Daniel Stutzbach
-
Jim Fulton
-
Nick Coghlan
-
Tim Peters