Tremendous slowdown due to garbage collection
aaron.watters at gmail.com
Tue Apr 15 18:35:27 CEST 2008
On Apr 14, 11:18 pm, Carl Banks <pavlovevide... at gmail.com> wrote:
> However, that is for the OP to decide. The reason I don't like the
> sort of question I posed is it's presumptuous--maybe the OP already
> considered and rejected this, and has taken steps to ensure the in
> memory data structure won't be swapped--but a database solution should
> at least be considered here.
Yes, you are right, especially if the index structure will be needed
many times over a long period of time. Even here though, these days,
you can go pretty far by loading everything into core (streaming from
disk) and dumping everything out when you are done, if needed
(ahem, using the preferred way to do this from python for
speed and safety: marshal ;) ).
Even with Btree's if you jump around in the tree the performance can
awful. This is why Nucular, for example, is designed to stream
results sequentially from disk whenever possible. The one place where
it doesn't do this very well (proximity searches) shows the most
problems with performance (under bad circumstances like searching
for two common words in proximity).
-- Aaron Watters
More information about the Python-list