gc penalty of 30-40% when manipulating large data structures?
aaron.watters at gmail.com
Fri Nov 16 15:34:31 CET 2007
Poking around I discovered somewhere someone saying that
Python gc adds a 4-7% speed penalty.
So since I was pretty sure I was not creating
reference cycles in nucular I tried running the tests with garbage
To my delight I found that index builds run 30-40% faster without
gc. This is really nice because testing gc.collect() afterward
shows that gc was not actually doing anything.
I haven't analyzed memory consumption but I suspect that should
be significantly improved also, since the index builds construct
some fairly large data structures with lots of references for a
garbage collector to keep track of.
Somewhere someone should mention the possibility that disabling
gc can greatly improve performance with no down side if you
don't create reference cycles. I couldn't find anything like this
on the Python site or elsewhere. As Paul (I think) said, this should
be a FAQ.
Further, maybe Python should include some sort of "backoff"
heuristic which might go like this: If gc didn't find anything and
memory size is stable, wait longer for the next gc cycle. It's
silly to have gc kicking in thousands of times in a multi-hour
run, finding nothing every time.
Just my 2c.
-- Aaron Watters
nucular full text fielded indexing: http://nucular.sourceforge.net
More information about the Python-list