[Tutor] Efficient word count

jfouhy@paradise.net.nz jfouhy at paradise.net.nz
Fri Jul 22 00:40:05 CEST 2005


Quoting Jorge Louis De Castro <jorge at bcs.org.uk>:

> I was wondering, and maybe this is because I come from a different
> programming language background, but is a word count using len(list)
> after a string.split, efficient if there are many words? Or should I
> write my own word count for large(ish) blocks of text (500-1000)?

Well, here's a few attempts at finding other ways of doing that:

E:\Python24\Lib>python timeit.py -s "foo = 'word wrd wordwordword '*1000"
"len(foo.split())"
1000 loops, best of 3: 1.44 msec per loop

E:\Python24\Lib>python timeit.py -s "foo = 'word wrd wordwordword '*1000"
"len([c for c in foo if c.isspace()])"
100 loops, best of 3: 9.18 msec per loop

E:\Python24\Lib>python timeit.py -s "foo = 'word wrd wordwordword '*1000"
"len([c for c in foo if c == ' '])"
100 loops, best of 3: 4.33 msec per loop

At a guess, you might be able to do it faster if you wrote a word counter in C,
because you could avoid building the list.  But len(s.split()) is probably the
quickest otherwise.

See also http://www.python.org/doc/essays/list2str.html :-)

-- 
John.


More information about the Tutor mailing list