Smart text parsing

Mathias Mamsch Zabelkind at
Fri Feb 6 03:07:41 CET 2004


I got a text with about 1 million words where I want to count words and put
them sorted to a list
like " list = [(most-common-word,1001),(2nd-word,986), ...] "

I think there are at about 10% (about 100.000) different words in the text.

I am wondering if you can give me something faster than my approach:
My first straightforward approach was:
s = "Hello this is my 1 million word text".split()

s2 = s.split()
dict = {}
for i in s2:         # the loop needs 10s
        if dict.has_key(i):
                dict[i] += 1
                dict[i] = 1
list = dict.items()
#   this is slow:
list.sort(lambda x,y: 2*(x[1] < y[1])-1)
That works, but i wonder if there is a faster, more elegant way to do this

Thanks for you interest,
    Mathias Mamsch

More information about the Python-list mailing list