GC is very expensive: am I doing something wrong?

Mon Mar 22 22:11:57 EDT 2010

Le Mon, 22 Mar 2010 23:40:16 +0000, tan a écrit :
> 
>>Remember that the original use case was to load a dictionary from a text
>>file. For this use case, a trie can be very wasteful in terms of memory
>>and rather CPU cache unfriendly on traversal, whereas hash values are a)
>>rather fast to calculate for a string, and b) often just calculated once
>>and then kept alive in the string object for later reuse.
> 
> You still have to walk the bucket in a hash map/table. Performance may
> be orders of magnitude worse than for trees.

"walk the bucket" shouldn't be a significant cost factor here, especially 
if you are doing meaningful work with the traversed items. In the CPython 
implementation the total hash table size is less than a constant times 
the number of actual items. Moreover, it's a linear scan over an array 
rather than having to dereference pointers as in tree.

"Orders of magnitude worse", in any case, sounds very exaggerated.

(and, of course, as the OP said, there's the major difference that the 
dict type is implemented in C, which makes constant factors an order of 
magnitude smaller than for a Python trie implementation)