Orders of magnitude

Paul Rubin http
Mon Mar 29 09:53:08 CEST 2004

"Robert Brewer" <fumanchu at amor.org> writes:
> I'm dedup'ing a 10-million-record dataset, trying different approaches
> for building indexes. The in-memory dicts are clearly faster, but I get
> Memory Errors (Win2k, 512 MB RAM, 4 G virtual). Any recommendations on
> other ways to build a large index without slowing down by a factor of
> 25?

Sort, then remove dups.

More information about the Python-list mailing list