number of different lines in a file
r.s at ZZmindspring.com
Fri May 19 02:04:30 CEST 2006
"Tim Chase" <python.list at tim.thechases.com> wrote ...
> 2) use a python set:
> s = set()
> for line in open("file.in"):
> return len(s)
> 3) compact #2:
> return len(set([line.strip() for line in file("file.in")]))
> or, if stripping the lines isn't a concern, it can just be
> return len(set(file("file.in")))
> The logic in the set keeps track of ensuring that no
> duplicates get entered.
> Depending on how many results you *expect*, this could
> become cumbersome, as you have to have every unique line in
> memory. A stream-oriented solution can be kinder on system
> resources, but would require that the input be sorted first.
Thank you (and all the others who responded!) -- set() does
the trick, reducing the job to about a minute. I may play
later with the other alternatives people mentionsed (dict(),
hash(),...), just out of curiosity. I take your point about
the "expected number", which in my case was around 0-10 (as
it turned out, there were no dups).
BTW, the first thing I tried was Fredrik Lundh's program:
return len(set(s.strip() for s in open(fn)))
which worked without the square brackets. Interesting that
omitting them doesn't seem to matter.
More information about the Python-list