When Python outruns C++ ...
Jp Calderone
exarkun at intarweb.us
Tue Apr 1 02:16:11 EST 2003
On Tue, Apr 01, 2003 at 01:48:26AM +0000, Julian Tibble wrote:
> [snip]
>
>
> Julian
>
> p.s. maybe my python version is not very good, feel free to make
> it go faster:
>
> #
> # Count the number of occurences of each word in a file
> # -- Julian Tibble
> #
>
> import re
>
> # "Word -> Number of occurences" mapping
> wordcount = {}
>
> # Regular expression to match a word
> re_word = re.compile('[a-z]+')
>
> # Read in whole file and split into words
> try:
> while 1:
> line = raw_input().lower()
> line = re_word.findall(line)
> for word in line:
> if wordcount.has_key(word):
> wordcount[word] += 1
> else:
> wordcount[word] = 1
> except EOFError:
> pass
>
>
> # Order by descending number of occurences
> sortable = zip(wordcount.values(), wordcount.keys())
> sortable.sort()
> sortable.reverse()
>
> for n,w in sortable:
> print w,n
>
It looks like this could easily be improved with the application of a
slightly smarter searching algorithm, such as Rabin-Karp search or
Boyer-Moore find.
Both of these use a technique of doing a little extra work up front to
make each pass through the loop much less expensive. Google for details.
Jp
--
"There is no reason for any individual to have a computer in their
home."
-- Ken Olson, President of DEC, World Future Society
Convention, 1977
--
up 12 days, 2:00, 6 users, load average: 0.00, 0.00, 0.00
More information about the Python-list
mailing list