gc penalty of 30-40% when manipulating large data structures?

Aaron Watters
Fri Nov 16 15:34:31 CET 2007

Poking around I discovered somewhere someone saying that
Python gc adds a 4-7% speed penalty.

So since I was pretty sure I was not creating
reference cycles in nucular I tried running the tests with garbage
collection disabled.

To my delight I found that index builds run 30-40% faster without
gc.  This is really nice because testing gc.collect() afterward
shows that gc was not actually doing anything.

I haven't analyzed memory consumption but I suspect that should
be significantly improved also, since the index builds construct
some fairly large data structures with lots of references for a
garbage collector to keep track of.

Somewhere someone should mention the possibility that disabling
gc can greatly improve performance with no down side if you
don't create reference cycles.  I couldn't find anything like this
on the Python site or elsewhere.  As Paul (I think) said, this should
be a FAQ.

Further, maybe Python should include some sort of "backoff"
heuristic which might go like this: If gc didn't find anything and
memory size is stable, wait longer for the next gc cycle.  It's
silly to have gc kicking in thousands of times in a multi-hour
run, finding nothing every time.

Just my 2c.
   -- Aaron Watters

nucular full text fielded indexing: http://nucular.sourceforge.net

