Tremendous slowdown due to garbage collection

John Nagle nagle at animats.com
Sat Apr 12 12:39:46 EDT 2008


andreas.eisele at gmail.com wrote:
> In an application dealing with very large text files, I need to create
> dictionaries indexed by tuples of words (bi-, tri-, n-grams) or nested
> dictionaries. The number of different data structures in memory grows
> into orders beyond 1E7.
> 
> It turns out that the default behaviour of Python is not very suitable
> for such a situation, as garbage collection occasionally traverses all
> objects in memory (including all tuples) in order to find out which
> could be collected. This leads to the sitation that creating O(N)
> objects effectively takes O(N*N) time.

    Do your data structures need garbage collection?  CPython is
a reference counted system with garbage collection as a backup
to catch cycles.  Try using weak back-pointers in your data
structures to avoid creating cycles.

				John Nagle



More information about the Python-list mailing list