[Python-Dev] Billions of gc's

Tim Peters tim.one@comcast.net
Mon, 29 Apr 2002 23:55:41 -0400


[Aahz]
> My take is that programs with a million live objects and no cycles are
> common enough that gc should be designed to handle that smoothly.

Well, millions of live objects is common but isn't a problem.  The glitch
we're looking at it is surprising slowdown with millions of live *container*
objects.  The latter isn't so common.

> I don't think that a programmer casually writing such applications
> (say, processing information from a database) should be expected to
> understand gc well enough to tune it.

People casually writing applications pushing the limits of their boxes are
in for more surprises than just this <wink>.

> Having read the entire discussion so far, and *NOT* being any kind of
> gc expert, I would say that Tim's adaptive solution makes the most
> sense to me.  For years, we told people with cyclic data to figure out
> how to fix the problem themselves; now that we have gc available, I
> don't think we should punish everyone else.

We're not trying to punish anyone, but innocent users with lots of
containers can lose big despite our wishes:  if we don't check them for
cycles, they can run out of memory; if we do check them for cycles, it
necessarily consumes time.

As a datapoint, here are the times (in seconds) for justzip() on my box
after my checkin to precompute the result size (list.append behavior is
irrelevant now):

gc disabled:  0.64
gc enabled:   7.32
magic=2(*):   2.63
magic=3(*):   2.02

(*) This is gcmodule.c fiddled to add this block after "collections1 = 0;"
in the first branch of collect_generations():

		if (n == 0)
			threshold2 *= magic;
		else if (threshold2 > 5)
			threshold2 /= magic;

magic=1 is equivalent to the current code.  That's all an "adaptive scheme"
need amount to, provided the "*=" part were fiddled to prevent threshold2
from becoming insanely large.  Boosting magic above 3 didn't do any more
good in this test.

At magic=3 it still takes 3+ times longer than with gc disabled, but that's
a whale of a lot better than the current 11+ times longer.  Note that with
gc disabled, any cycle in any of the 1,000,001 containers this test creates
would leak forever -- casual users definitely get something back for the
time spent.