number of different lines in a file
Andrew Robert
andrew.arobert at gmail.com
Thu May 18 18:40:12 EDT 2006
r.e.s. wrote:
> I have a million-line text file with 100 characters per line,
> and simply need to determine how many of the lines are distinct.
>
> On my PC, this little program just goes to never-never land:
>
> def number_distinct(fn):
> f = file(fn)
> x = f.readline().strip()
> L = []
> while x<>'':
> if x not in L:
> L = L + [x]
> x = f.readline().strip()
> return len(L)
>
> Would anyone care to point out improvements?
> Is there a better algorithm for doing this?
Take a look at http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560
It is a python approach to the uniq command on *nix.
More information about the Python-list
mailing list