help make it faster please

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Thu Nov 10 19:43:04 CET 2005


This can be faster, it avoids doing the same things more times:

from string import maketrans, ascii_lowercase, ascii_uppercase

def create_words(afile):
    stripper = """'[",;<>{}_&?!():[]\.=+-*\t\n\r^%0123456789/"""
    mapper = maketrans(stripper + ascii_uppercase,
                       " "*len(stripper) + ascii_lowercase)
    countDict = {}
    for line in afile:
        for w in line.translate(mapper).split():
            if w:
                if w in countDict:
                    countDict[w] += 1
                else:
                    countDict[w] = 1
    word_freq = countDict.items()
    word_freq.sort()
    for word, freq in word_freq:
        print word, freq

create_words(file("test.txt"))


If you can load the whole file in memory then it can be made a little
faster...

Bear hugs,
bearophile




More information about the Python-list mailing list