[Python-Dev] Lazily GC tracking tuples
Daniel Dunbar
evilzr@yahoo.com
Sat, 4 May 2002 02:25:54 -0700 (PDT)
A few days ago the idea of untracking, or somehow
removing, immutable and provably uncyclic tuples from
garbage collection was bounced about - at the time the
idea was to untrack a tuple once it was noticed that
it contained no cyclic objects... this proves annoying
to implement because of unfilled tuples, and C-api
calls that mutate the tuple (highly unusual it seems).
Lazily tracking the tuples seems to be a much simpler
proposition: a tuple can only be a member of a cycle
once an element of ob_item is set to an object that
could contain a cycle, presumably items are set in a
tuple far less frequently than they are accessed (by
normal code or the collector), so I imagine this method
is also more efficient.
The changes required are fairly small,
PyObject_GC_IS_TRACKED is added to determine if an
object is tracked, the gc track and untrack operations
in tupleobject.c are modified to account for the tuple
not always being tracked, and PyTuple_SET_ITEM and
PyTuple_SetItem are modified to begin tracking the
tuple if a) it is not already tracked, and b) the
item being set _could_ contain a cycle.
At the moment, I use PyObject_IS_GC as a conserative
approximation to '_could_ contain a cycle', which
works well except for nested tuples.
In theory if v is a tuple, _SetItem(o,i,v) should only
be called when v has been completely filled, so if v
is untracked then we know it also cannot be member of a
cycle, which solves the nested tuple problem.
In practice making the nested tuple change did not
cause any change in behavior on the test suite, but
I have left it out for simplicity. Large trees
of atomic data represented as tuples seems like
a case where it could possibly make a difference.
A 2.23 patch for the changes outlined above are at:
http://www.geocities.com/evilzr/lazytuplegc/
(I wasn't sure if such things are appropriate for
the SF-patch manager - yes/no?)
If you want to actually test the changes you need
a recent checkout, there was a minor collector bug
that the lazy tracking exploited heavily.
Some stats
-
cvs | lazytuplegc
---------------------------------------
iterzip_test, N=500000
juststore: 1.46 | 1.43
gc size 1143 | 788
justtups: 1.40 | 1.30
gc size 1143 | 788
storetups: 8.48 | 4.11
gc size 501143 | 788
base gc set size
python: 1081 | 742
idle: 7754 | 3818
pybench 1.0 avg time (n=10,warp=20)
69813.40 | 69708.04
The iterzip_test is only slightly munged from
the one Tim Peters posted on the iterzip() thread.
The pybench averages are almost identical, however
there are large fluctuations on the individual tests, see
http://www.geocities.com/evilzr/lazytuplegc/cmp.txt
The base gc set numbers are obtained by starting
python (or idle) and immediatly running a collection
with gc.DEBUG_STATS on. All gc size numbers are
size(gen0)+size(gen1)+size(gen2).
this patch causes one regrtest to fail (test_gc,
in test_staticmethod, for reasons I do not yet
understand (help!)).
... make of it what you will :)
Oh and, not to make this post longer, but heres
the first-pydev-list-posting-info:
I'm Daniel Dunbar, my introduction to Python was
embedding it into the Blender 3D package (www.blender3d.com,
but we went out of business - oops).
Since then I have used it for lots of esoteric
little tasks, but only recently after wrapping
some of my C-based libraries have I started
writing complex (ie. applications not utilities)
programs with it.
My main interest related to python development
is making it go faster (so I can optimize my code
less)... but thats probably not rare.
=====
daniel dunbar
evilzr@yahoo.com
__________________________________________________
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com