Tremendous slowdown due to garbage collection

Sat Apr 12 16:01:48 EDT 2008

> Martin said that the default settings for the cyclic gc works for most
> people.

I agree.

> Your test case has found a pathologic corner case which is *not*
> typical for common application but typical for an artificial benchmark.

I agree that my "corner" is not typical, but I strongly disagree with
the classification as pathological. The only feature of my test case
that is not typical is the huge number of distinct objects that are
allocated. I admit that 1E7 objects is today still fairly untypical,
but there is nothing pathological about it, it is just bigger. I is
about as pathological as a file size >2G, which a few years ago seemed
so outrageous that no OS bothered to support it, but is fairly common
nowadays, so that a lack of support would appear as an arbitrary and
unmotivated limitation nowadays. We all enjoy seeing Python adopted on
a large scale and used by a broad community, so we should not accept
arbitrary size limits. You could call a string with more than 2GB
pathological, but I very much appreciate the fact that Python supports
such strings for the few cases where they are needed (on a 64 bit
architecture). Now a O(N*N) effort for large numbers of objects isn't
such a hard limit, but in practice boils down to the same effect, that
people cannot use Python in such circumstances. I would prefer it very
much if such "soft limits" could be avoided as well.

Given there is a fairly simple workaround (thanks again to Amaury!),
the issue is not urgent, but I still think it is important in the long
run.

> Python is optimized for regular apps, not for benchmark (like some video
> drivers).
>

I still think it would be worthwhile to support very large numbers of
objects in a way that they can just be used, without knowledge of
special tricks, and I would be fairly optimistic that those who have
designed the current GC schemes could generalize them slightly so that
these marginal cases will work better without imposing a penalty on
the more typical cases.

> By the way you shouldn't use range for large ranges of more than a
> thousand items. xrange() should be faster and it will definitely use
> much less memory - and memory Python 2.5 and older will never release
> again. I'm going to fix the issue for Python 2.6 and 3.0.
>

Thanks for this hint, and for the work on the newer versions. This is
very much appreciated.

Andreas