[Tutor] Please look at my wordFrequency.py
kent37 at tds.net
Tue Oct 11 12:24:29 CEST 2005
Dick Moores wrote:
> (Execution took about 30 sec. with my computer.)
That's way too long
> Specifically, I'm hoping for comments on or help with:
> 2) I've tried to put in remarks that will help most anyone to understand
> what the code is doing. Have I succeeded?
Yes, i think so
> 3) No modularization. Couldn't see a reason to do so. Is there one or two?
> Specifically, what sections should become modules, if any?
As Danny says, breaking it up into functions makes it easier to understand and test
> 4) Variable names. I gave up on making them self-explanatory. Instead, I
> put in some remarks near the top of the script (lines 6-10) that I hope
> do the job. Do they? In the code, does the "L to newL to L to newL to L"
> kind of thing remain puzzling?
Some of your variables seem unnecessary. For example
newWord = word.strip(chars)
word = newWord
could be just
word = word.strip(chars)
> 5) Ideally, abbreviations that end in a period, such as U.N., e.g., i.e.,
> viz. op. cit., Mr. (Am. E.), etc., should not be stripped of their final
> periods (whereas other words that end a sentence SHOULD be stripped). I
> tried making and using a Python list of these, but it was too tough to
> write the code to use it. Any ideas?
You should be able to do this with regular expressions or searching in the word. You want to test for a word that ends with a period but doesn't include any periods. Somenthing like
if word.endswith('.') and '.' not in word[:-1]:
word = word[:-1]
Use re.split() to do all the splits at once. Something like
L = re.split(r'\s+|--|/', textAsString)
#remove empty elements in L
while "" in L:
The above iterates L twice for each empty word! The remove() calls are expensive too because the remaining elements of L must be shifted down. Do the whole thing in one pass over L with
L = [ w for w in L if w ]
You only need to remove empty elements once, when the rest of the processing is done.
for e in saveRemovedForLaterL:
More information about the Tutor