removing duplication from a huge list.

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Fri Feb 27 09:58:39 CET 2009


odeits:
> How big of a list are we talking about? If the list is so big that the
> entire list cannot fit in memory at the same time this approach wont
> work e.g. removing duplicate lines from a very large file.

If the data are lines of a file, and keeping the original order isn't
important, then the first to try may be to use the unix (or cygwin on
Windows) commands sort and uniq.

Bye,
bearophile



More information about the Python-list mailing list