removing duplication from a huge list.

Stefan Behnel stefan_ml at behnel.de
Fri Feb 27 04:18:06 EST 2009


bearophileHUGS at lycos.com wrote:
> odeits:
>> How big of a list are we talking about? If the list is so big that the
>> entire list cannot fit in memory at the same time this approach wont
>> work e.g. removing duplicate lines from a very large file.
> 
> If the data are lines of a file, and keeping the original order isn't
> important, then the first to try may be to use the unix (or cygwin on
> Windows) commands sort and uniq.

or preferably "sort -u", in case that's supported.

Stefan



More information about the Python-list mailing list