[Python-Dev] iterzip()

Tim Peters tim.one@comcast.net
Mon, 29 Apr 2002 22:36:12 -0400


[Neil Schemenauer]
> Adding a fourth generation drops the time from 5.13 to 2.11 on my
> machine.  Adding a fifth doesn't seem to make a difference.  I used 10
> as the threshold for both new generations.

Alas, these thresholds are a little hard to work with.  For example

      ...
	else if (collections0 > threshold1) {
		...
		collections1++;
		/* merge gen0 into gen1 and collect gen1 */
		...
		collections0 = 0;
	}
	else {
		generation = 0;
		collections0++;
		... /* collect gen0 */ ...
	}

Let's say threshold1 is 10 (because it is <wink>), and we just finished a
gen1 collection.  Then collections0 is 0.  We have to do 11 gen0 collections
then before "collections0 > threshold1" succeeds, and that point is actually
the 12th time gen0 has filled up since the last time we did a gen1
collection.

Similarly for collections1 vs threshold2.

This makes it hard to multiply them out in an obvious way <wink>.

Anyway, with 4 generations it takes in the ballpark of 700 * 10 * 10 * 10 =
700,000 excess allocations before a gen3 collection is triggered, so I
expect you saw exactly one gen3 collection during the lifetime of the test
run (there are about 1,000,000 excess allocations during its run).  Also
that adding a fifth generation wouldn't matter at all in this test, since
you'd still see exactly one gen3 collection, and a gen4 collection would
never happen.

Now another ballpark:  On the only machine that matters in real life (mine),
I'm limited to 2**31 bytes of user address space, and an object
participating in gc can rarely be smaller than 40 bytes.  That means I can't
have more than 2**31/40 ~= 55 million gcable objects alive at once, and that
also bounds the aggregate excess of allocations over deallocations.  That
surprised me.  It means the "one million tuple" test is already taxing a
non-trivial percentage of this box's theoretical capacity.  Indeed, I tried
boosting it to 10 million, and after glorious endless minutes of listening
to the disk grind itself to dust (with gc disabled, even), Win98 rebooted
itself.

So another factor-of-10 generation or two would probably move the gross
surprises here out of the realm of practical concern.  Except, of course,
for the programs where it wouldn't <wink>.