[Python-Dev] Lazily GC tracking tuples

4 May 2002

      A few days ago the idea of untracking, or somehow 
removing, immutable and provably uncyclic tuples from 
garbage collection was bounced about - at the time the 
idea was to untrack a tuple once it was noticed that 
it contained no cyclic objects... this proves annoying 
to implement because of unfilled tuples, and C-api 
calls that mutate the tuple (highly unusual it seems).

Lazily tracking the tuples seems to be a much simpler
proposition: a tuple can only be a member of a cycle
once an element of ob_item is set to an object that
could contain a cycle, presumably items are set in a
tuple far less frequently than they are accessed (by
normal code or the collector), so I imagine this method
is also more efficient.

The changes required are fairly small, 
PyObject_GC_IS_TRACKED is added to determine if an 
object is tracked, the gc track and untrack operations
in tupleobject.c are modified to account for the tuple 
not always being tracked, and PyTuple_SET_ITEM and 
PyTuple_SetItem are modified to begin tracking the
tuple if a) it is not already tracked, and b) the
item being set _could_ contain a cycle.

At the moment, I use PyObject_IS_GC as a conserative
approximation to '_could_ contain a cycle', which
works well except for nested tuples. 

In theory if v is a tuple, _SetItem(o,i,v) should only 
be called when v has been completely filled, so if v
is untracked then we know it also cannot be member of a
cycle, which solves the nested tuple problem.

In practice making the nested tuple change did not 
cause any change in behavior on the test suite, but
I have left it out for simplicity. Large trees 
of atomic data represented as tuples seems like
a case where it could possibly make a difference.

A 2.23 patch for the changes outlined above are at:
http://www.geocities.com/evilzr/lazytuplegc/
(I wasn't sure if such things are appropriate for
the SF-patch manager - yes/no?)

If you want to actually test the changes you need
a recent checkout, there was a minor collector bug
that the lazy tracking exploited heavily.

Some stats
-
                  cvs   | lazytuplegc
---------------------------------------
iterzip_test, N=500000
  juststore:      1.46  |     1.43
    gc size       1143  |      788
  justtups:       1.40  |     1.30
    gc size       1143  |      788
  storetups:      8.48  |     4.11 
    gc size     501143  |      788

base gc set size
python:           1081  |      742
  idle:           7754  |     3818

pybench 1.0 avg time (n=10,warp=20) 
              69813.40  | 69708.04

The iterzip_test is only slightly munged from
the one Tim Peters posted on the iterzip() thread.

The pybench averages are almost identical, however 
there are large fluctuations on the individual tests, see 
http://www.geocities.com/evilzr/lazytuplegc/cmp.txt

The base gc set numbers are obtained by starting
python (or idle) and immediatly running a collection
with gc.DEBUG_STATS on. All gc size numbers are 
size(gen0)+size(gen1)+size(gen2).

this patch causes one regrtest to fail (test_gc, 
in test_staticmethod, for reasons I do not yet 
understand (help!)).

... make of it what you will :)

Oh and, not to make this post longer, but heres
the first-pydev-list-posting-info: 

I'm Daniel Dunbar, my introduction to Python was 
embedding it into the Blender 3D package (www.blender3d.com,
but we went out of business - oops).

Since then I have used it for lots of esoteric 
little tasks, but only recently after wrapping 
some of my C-based libraries have I started 
writing complex (ie. applications not utilities)
programs with it.

My main interest related to python development
is making it go faster (so I can optimize my code
less)... but thats probably not rare.

=====
daniel dunbar
evilzr@yahoo.com

__________________________________________________
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com