number of different lines in a file
r.e.s.
r.s at ZZmindspring.com
Thu May 18 20:04:30 EDT 2006
"Tim Chase" <python.list at tim.thechases.com> wrote ...
> 2) use a python set:
>
> s = set()
> for line in open("file.in"):
> s.add(line.strip())
> return len(s)
>
> 3) compact #2:
>
> return len(set([line.strip() for line in file("file.in")]))
>
> or, if stripping the lines isn't a concern, it can just be
>
> return len(set(file("file.in")))
>
> The logic in the set keeps track of ensuring that no
> duplicates get entered.
>
> Depending on how many results you *expect*, this could
> become cumbersome, as you have to have every unique line in
> memory. A stream-oriented solution can be kinder on system
> resources, but would require that the input be sorted first.
Thank you (and all the others who responded!) -- set() does
the trick, reducing the job to about a minute. I may play
later with the other alternatives people mentionsed (dict(),
hash(),...), just out of curiosity. I take your point about
the "expected number", which in my case was around 0-10 (as
it turned out, there were no dups).
BTW, the first thing I tried was Fredrik Lundh's program:
def number_distinct(fn):
return len(set(s.strip() for s in open(fn)))
which worked without the square brackets. Interesting that
omitting them doesn't seem to matter.
More information about the Python-list
mailing list