GC is very expensive: am I doing something wrong?

Stefan Behnel stefan_ml at behnel.de
Mon Mar 22 08:42:07 CET 2010

Lawrence D'Oliveiro, 22.03.2010 00:36:
> Terry Reedy wrote:
>> No one has discovered a setting
>> of the internal tuning parameters for which there are no bad patterns
>> and I suspect there are not any such. This does not negate Xavier's
>> suggestion that a code change might also solve your problem.
> Could it be that for implementing a structure like a trie as the OP is,
> where a lot of CPU cycles can be spent manipulating the structure, a high-
> level language like Python, Perl or Ruby just gets in the way?

I would rather say that the specific problem of the trie data structure is 
that it has extremely little benefit over other available data structures. 
There may still be a couple of niches where it makes sense to consider it 
as an alternative, but given that dicts are so heavily optimised in Python, 
it'll be hard for tries to compete even when written in a low-level language.

Remember that the original use case was to load a dictionary from a text 
file. For this use case, a trie can be very wasteful in terms of memory and 
rather CPU cache unfriendly on traversal, whereas hash values are a) rather 
fast to calculate for a string, and b) often just calculated once and then 
kept alive in the string object for later reuse.

> My feeling would be, try to get the language to do as much of the work for
> you as possible. If you can’t do that, then you might be better off with a
> lower-level language.

I agree with the first sentence, but I'd like to underline the word 'might' 
in the second. As this newsgroup shows, very often it's enough to look for 
a better algorithmic approach first.


More information about the Python-list mailing list