How to count lines in a text file ?
Alex Martelli
aleaxit at yahoo.com
Wed Sep 22 15:17:01 EDT 2004
Bengt Richter <bokr at oz.net> wrote:
...
> >memory at once. If you must be able to deal with humungoug files, too
> >big to fit in memory at once, try something like:
> >
> >numlines = 0
> >for line in open('text.txt'): numlines += 1
>
> I don't have 2.4
2.4a3 is freely available for download and everybody's _encouraged_ to
download it and try it out -- come on, don't be the last one to!-)
> but how would that compare with a generator expression like (untested)
>
> sum(1 for line in open('text.txt'))
>
> or, if you _are_ willing to read in the whole file,
>
> open('text.txt').read().count('\n')
I'm not on the same machine as when I ran the other timing measurements
(including pyrex &c) but here's the results on this one machine...:
$ wc /usr/share/dict/words
234937 234937 2486825 /usr/share/dict/words
$ python2.4 ~/cb/timeit.py "numlines=0
for line in file('/usr/share/dict/words'): numlines+=1"
10 loops, best of 3: 3.08e+05 usec per loop
$ python2.4 ~/cb/timeit.py
"file('/usr/share/dict/words').read().count('\n')"
10 loops, best of 3: 2.72e+05 usec per loop
$ python2.4 ~/cb/timeit.py
"len(file('/usr/share/dict/words').readlines())"
10 loops, best of 3: 3.25e+05 usec per loop
$ python2.4 ~/cb/timeit.py "sum(1 for line in
file('/usr/share/dict/words'))"
10 loops, best of 3: 4.42e+05 usec per loop
Last but not least...:
$ python2.4 ~/cb/timeit.py -s'import cou'
"cou.cou(file('/usr/share/dict/words'))"
10 loops, best of 3: 2.05e+05 usec per loop
where cou.pyx is the pyrex program I've already shown on the other
subthread. Using the count.c I've also shown takes 2.03e+05 usec.
(Can't try psyco here, not an intel-like cpu).
Summary: "sum(1 for ...)" is no speed daemon; the plain loop is best
among the pure-python approaches for files that can't fit in memory. If
the file DOES fit in memory, read().count('\n') is faster, but
len(...readlines()) is slower. Pyrex rocks, essentially removing the
need for C-coded extensions (less than a 1% advantage) -- and so does
psyco, but not if you're using a Mac (quick, somebody gift Armin Rigo
with a Mac before it's too late...!!!).
Alex
More information about the Python-list
mailing list