A few days ago the idea of untracking, or somehow removing, immutable and provably uncyclic tuples from garbage collection was bounced about - at the time the idea was to untrack a tuple once it was noticed that it contained no cyclic objects... this proves annoying to implement because of unfilled tuples, and C-api calls that mutate the tuple (highly unusual it seems).
Lazily tracking the tuples seems to be a much simpler proposition: a tuple can only be a member of a cycle once an element of ob_item is set to an object that could contain a cycle, presumably items are set in a tuple far less frequently than they are accessed (by normal code or the collector), so I imagine this method is also more efficient.
The changes required are fairly small, PyObject_GC_IS_TRACKED is added to determine if an object is tracked, the gc track and untrack operations in tupleobject.c are modified to account for the tuple not always being tracked, and PyTuple_SET_ITEM and PyTuple_SetItem are modified to begin tracking the tuple if a) it is not already tracked, and b) the item being set _could_ contain a cycle.
At the moment, I use PyObject_IS_GC as a conserative approximation to '_could_ contain a cycle', which works well except for nested tuples.
In theory if v is a tuple, _SetItem(o,i,v) should only be called when v has been completely filled, so if v is untracked then we know it also cannot be member of a cycle, which solves the nested tuple problem.
In practice making the nested tuple change did not cause any change in behavior on the test suite, but I have left it out for simplicity. Large trees of atomic data represented as tuples seems like a case where it could possibly make a difference.
A 2.23 patch for the changes outlined above are at: http://www.geocities.com/evilzr/lazytuplegc/ (I wasn't sure if such things are appropriate for the SF-patch manager - yes/no?)
If you want to actually test the changes you need a recent checkout, there was a minor collector bug that the lazy tracking exploited heavily.
cvs | lazytuplegc
iterzip_test, N=500000 juststore: 1.46 | 1.43 gc size 1143 | 788 justtups: 1.40 | 1.30 gc size 1143 | 788 storetups: 8.48 | 4.11 gc size 501143 | 788
base gc set size python: 1081 | 742 idle: 7754 | 3818
pybench 1.0 avg time (n=10,warp=20) 69813.40 | 69708.04
The iterzip_test is only slightly munged from the one Tim Peters posted on the iterzip() thread.
The pybench averages are almost identical, however there are large fluctuations on the individual tests, see http://www.geocities.com/evilzr/lazytuplegc/cmp.txt
The base gc set numbers are obtained by starting python (or idle) and immediatly running a collection with gc.DEBUG_STATS on. All gc size numbers are size(gen0)+size(gen1)+size(gen2).
this patch causes one regrtest to fail (test_gc, in test_staticmethod, for reasons I do not yet understand (help!)).
... make of it what you will :)
Oh and, not to make this post longer, but heres the first-pydev-list-posting-info:
I'm Daniel Dunbar, my introduction to Python was embedding it into the Blender 3D package (www.blender3d.com, but we went out of business - oops).
Since then I have used it for lots of esoteric little tasks, but only recently after wrapping some of my C-based libraries have I started writing complex (ie. applications not utilities) programs with it.
My main interest related to python development is making it go faster (so I can optimize my code less)... but thats probably not rare.
===== daniel dunbar email@example.com
Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com