[Tutor] Please look at my wordFrequency.py

Kent Johnson kent37 at tds.net
Tue Oct 11 12:24:29 CEST 2005

Dick Moores wrote:
> (Execution took about 30 sec. with my computer.)

That's way too long
> Specifically, I'm hoping for comments on or help with:
> 2) I've tried to put in remarks that will help most anyone to understand 
> what the code is doing. Have I succeeded?

Yes, i think so

> 3) No modularization. Couldn't see a reason to do so. Is there one or two?
> Specifically, what sections should become modules, if any?

As Danny says, breaking it up into functions makes it easier to understand and test

> 4) Variable names. I gave up on making them self-explanatory. Instead, I 
> put in some remarks near the top of the script (lines 6-10) that I hope 
> do the job. Do they? In the code, does the "L to newL to L to newL to L" 
> kind of thing remain puzzling?

Some of your variables seem unnecessary. For example
    newWord = word.strip(chars)
    word = newWord
could be just
    word = word.strip(chars)

> 5) Ideally, abbreviations that end in a period, such as U.N., e.g., i.e., 
> viz. op. cit., Mr. (Am. E.), etc., should not be stripped of their final 
> periods (whereas other words that end a sentence SHOULD be stripped). I 
> tried making and using a Python list of these, but it was too tough to 
> write the code to use it. Any ideas?

You should be able to do this with regular expressions or searching in the word. You want to test for a word that ends with a period but doesn't include any periods. Somenthing like
if word.endswith('.') and '.' not in word[:-1]:
  word = word[:-1]

Other notes:
Use re.split() to do all the splits at once. Something like
  L = re.split(r'\s+|--|/', textAsString)

#remove empty elements in L
while "" in L:
The above iterates L twice for each empty word! The remove() calls are expensive too because the remaining elements of L must be shifted down. Do the whole thing in one pass over L with
    L = [ w for w in L if w ]

You only need to remove empty elements once, when the rest of the processing is done.

for e in saveRemovedForLaterL:
could be

More information about the Tutor mailing list