[Python-Dev] Billions of gc's
Tim Peters
tim.one@comcast.net
Mon, 29 Apr 2002 23:55:41 -0400
[Aahz]
> My take is that programs with a million live objects and no cycles are
> common enough that gc should be designed to handle that smoothly.
Well, millions of live objects is common but isn't a problem. The glitch
we're looking at it is surprising slowdown with millions of live *container*
objects. The latter isn't so common.
> I don't think that a programmer casually writing such applications
> (say, processing information from a database) should be expected to
> understand gc well enough to tune it.
People casually writing applications pushing the limits of their boxes are
in for more surprises than just this <wink>.
> Having read the entire discussion so far, and *NOT* being any kind of
> gc expert, I would say that Tim's adaptive solution makes the most
> sense to me. For years, we told people with cyclic data to figure out
> how to fix the problem themselves; now that we have gc available, I
> don't think we should punish everyone else.
We're not trying to punish anyone, but innocent users with lots of
containers can lose big despite our wishes: if we don't check them for
cycles, they can run out of memory; if we do check them for cycles, it
necessarily consumes time.
As a datapoint, here are the times (in seconds) for justzip() on my box
after my checkin to precompute the result size (list.append behavior is
irrelevant now):
gc disabled: 0.64
gc enabled: 7.32
magic=2(*): 2.63
magic=3(*): 2.02
(*) This is gcmodule.c fiddled to add this block after "collections1 = 0;"
in the first branch of collect_generations():
if (n == 0)
threshold2 *= magic;
else if (threshold2 > 5)
threshold2 /= magic;
magic=1 is equivalent to the current code. That's all an "adaptive scheme"
need amount to, provided the "*=" part were fiddled to prevent threshold2
from becoming insanely large. Boosting magic above 3 didn't do any more
good in this test.
At magic=3 it still takes 3+ times longer than with gc disabled, but that's
a whale of a lot better than the current 11+ times longer. Note that with
gc disabled, any cycle in any of the 1,000,001 containers this test creates
would leak forever -- casual users definitely get something back for the
time spent.