number of different lines in a file

Andrew Robert andrew.arobert at gmail.com
Fri May 19 00:40:12 CEST 2006


r.e.s. wrote:
> I have a million-line text file with 100 characters per line,
> and simply need to determine how many of the lines are distinct.
> 
> On my PC, this little program just goes to never-never land:
> 
> def number_distinct(fn):
>     f = file(fn)
>     x = f.readline().strip()
>     L = []
>     while x<>'':
>         if x not in L:
>             L = L + [x]
>         x = f.readline().strip()
>     return len(L) 
> 
> Would anyone care to point out improvements? 
> Is there a better algorithm for doing this?

Take a look at http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560

It is a python approach to the uniq command on *nix.



More information about the Python-list mailing list