Tremendous slowdown due to garbage collection

Steve Holden steve at holdenweb.com
Sat Apr 12 23:58:22 CEST 2008


andreas.eisele at gmail.com wrote:
>> Martin said that the default settings for the cyclic gc works for most
>> people.
> 
> I agree.
> 
>> Your test case has found a pathologic corner case which is *not*
>> typical for common application but typical for an artificial benchmark.
> 
> I agree that my "corner" is not typical, but I strongly disagree with
> the classification as pathological. The only feature of my test case
> that is not typical is the huge number of distinct objects that are
> allocated. I admit that 1E7 objects is today still fairly untypical,
> but there is nothing pathological about it, it is just bigger. I is
> about as pathological as a file size >2G, which a few years ago seemed
> so outrageous that no OS bothered to support it, but is fairly common
> nowadays, so that a lack of support would appear as an arbitrary and
> unmotivated limitation nowadays. We all enjoy seeing Python adopted on
> a large scale and used by a broad community, so we should not accept
> arbitrary size limits. You could call a string with more than 2GB
> pathological, but I very much appreciate the fact that Python supports
> such strings for the few cases where they are needed (on a 64 bit
> architecture). Now a O(N*N) effort for large numbers of objects isn't
> such a hard limit, but in practice boils down to the same effect, that
> people cannot use Python in such circumstances. I would prefer it very
> much if such "soft limits" could be avoided as well.
> 
> Given there is a fairly simple workaround (thanks again to Amaury!),
> the issue is not urgent, but I still think it is important in the long
> run.
> 
>> Python is optimized for regular apps, not for benchmark (like some video
>> drivers).
>>
> 
> I still think it would be worthwhile to support very large numbers of
> objects in a way that they can just be used, without knowledge of
> special tricks, and I would be fairly optimistic that those who have
> designed the current GC schemes could generalize them slightly so that
> these marginal cases will work better without imposing a penalty on
> the more typical cases.
> 
I believe you are making surmises outside your range of competence 
there. While your faith in the developers is touching, the garbage 
collection scheme is something that has received a lot of attention with 
respect to performance under typical workloads over the years.

By the way, the term "pathological" (which I believe I also used about 
your application) doesn't imply anything bad about your program: it's 
merely a shorthand way of saying that it triggers this undesirable 
behavior in a garbage collector that is mostly satisfactory for other 
purposes.

>> By the way you shouldn't use range for large ranges of more than a
>> thousand items. xrange() should be faster and it will definitely use
>> much less memory - and memory Python 2.5 and older will never release
>> again. I'm going to fix the issue for Python 2.6 and 3.0.
>>
> 
> Thanks for this hint, and for the work on the newer versions. This is
> very much appreciated.
> 
Anyway, I'm glad Amaury's workaround has solved your problem at least 
temporarily. Your type of workload may become more typical over time, 
but at the moment you are probably somewhat ahead of the mainstream 
here. If you know there can't be any recoverable cycles then you are 
probably better off just telling the garbage collector that you know 
better than it does.

regards
  Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/




More information about the Python-list mailing list