Tremendous slowdown due to garbage collection
John Nagle
nagle at animats.com
Sat Apr 12 12:39:46 EDT 2008
andreas.eisele at gmail.com wrote:
> In an application dealing with very large text files, I need to create
> dictionaries indexed by tuples of words (bi-, tri-, n-grams) or nested
> dictionaries. The number of different data structures in memory grows
> into orders beyond 1E7.
>
> It turns out that the default behaviour of Python is not very suitable
> for such a situation, as garbage collection occasionally traverses all
> objects in memory (including all tuples) in order to find out which
> could be collected. This leads to the sitation that creating O(N)
> objects effectively takes O(N*N) time.
Do your data structures need garbage collection? CPython is
a reference counted system with garbage collection as a backup
to catch cycles. Try using weak back-pointers in your data
structures to avoid creating cycles.
John Nagle
More information about the Python-list
mailing list