[Tutor] How can I make this run faster?
bgailer at gmail.com
Mon Dec 21 19:23:56 CET 2009
Emad Nawfal (عمـ نوفل ـاد) wrote:
> Dear Tutors,
> The purpose of this script is to see how many vocalized forms map to a
> single consonantal form. For example, the form "fn" could be fan, fin,
> The input is a large list (taken from a file) that has ordinary words.
What does "ordinary" mean.
Are all the letteers lower case?
> The script creates a devocalized list, then compares the two lists.
> The problem: It takes over an hour to process 1 file. The average file
> size is 400,000 words.
> Question: How can I make it run faster? I have a large number of files.
The major consumer of time is the "outer product:" (for loop with list
comprehension). That is 400,000 squared operations!
And it is not necessary!
Also the program repeatedly call devocalize for the same word!
> Note: I'm not a programmer, so please avoid very technical terms.
You are writing a program. Does not that make you a programmer? How can
I tell what terms are too technical for you?
> Thank you in anticipation.
You are welcome in retrospect. Observations follow:
> def devocalize(word):
> vowels = "aiou"
> return "".join([letter for letter in word if letter not in vowels])
What happenex to"e"?
Use translate to delete the vowels. That will improve speed.
"translate( table[, deletechars])
Return a copy of the string where all characters occurring in the
optional argument deletechars are removed, and the remaining characters
have been mapped through the given translation table, which must be a
string of length 256."
Here is a program that will be much faster. I changed names to be more
meaningful (to me), and created a main function (standard operating
procedure in Python).
def devowel(word, table=''.join(chr(i) for i in range(256)), vowels="aiou"):
return word.translate(table, vowels)
allOriginalWords = ['him', 'ham', 'hum', 'fun', 'fan']
uniqueOriginalWords = set(allOriginalWords)
wordDict = collections.defaultdict(list)
for word in uniqueOriginalWords:
for lex, wordList in wordDict.iteritems():
print lex, " ".join(wordList)
Chapel Hill NC
More information about the Tutor