help make it faster please
Bengt Richter
bokr at oz.net
Fri Nov 11 22:47:44 EST 2005
On 10 Nov 2005 10:43:04 -0800, bearophileHUGS at lycos.com wrote:
>This can be faster, it avoids doing the same things more times:
>
>from string import maketrans, ascii_lowercase, ascii_uppercase
>
>def create_words(afile):
> stripper = """'[",;<>{}_&?!():[]\.=+-*\t\n\r^%0123456789/"""
> mapper = maketrans(stripper + ascii_uppercase,
> " "*len(stripper) + ascii_lowercase)
good way to prepare for split
> countDict = {}
> for line in afile:
> for w in line.translate(mapper).split():
> if w:
I suspect it's not possible to get '' in the list from somestring.split()
> if w in countDict:
> countDict[w] += 1
> else:
> countDict[w] = 1
does that beat the try and get versions? I.e., (untested)
try: countDict[w] += 1
except KeyError: countDict[w] = 1
or
countDict[w] = countDict.get(w, 0) + 1
> word_freq = countDict.items()
> word_freq.sort()
> for word, freq in word_freq:
> print word, freq
>
>create_words(file("test.txt"))
>
>
>If you can load the whole file in memory then it can be made a little
>faster...
>
>Bear hugs,
>bearophile
>
Regards,
Bengt Richter
More information about the Python-list
mailing list