[Tutor] Please look at my wordFrequency.py

Dick Moores rdm at rcblue.com
Tue Oct 11 10:43:44 CEST 2005


John Fouhy wrote at 14:47 10/10/2005:

>Some comments:
>
>----
>textAsString = input.read()
>
>S = ""
>for c in textAsString:
>     if c == "\n":
>         S += ' '
>     else:
>         S += c
>----
>
>You could write this more concisely as:
>
>S = textAsString.replace('\n', ' ')

Yes! Thanks. That should have occurred to me.

>----
># At this point, each element ("word" in code below) of L is
># a string containing a real word such as "dog",
># where "dog" may be prefixed and/or suffixed by strings of
># non-alphanumeric characters. So, for example, word could be "'dog?!".
># The following code first strips these prefixed or suffixed 
>non-alphanumeric
># characters and then finds any words with dashes ("--") or forward
>slashes ("/"),
># such as in "and/or". These then become 2 or more words without the
># dashes or slashes.
>----
>
>What about using regular expressions?
>
>re.sub('\W+', ' ') will replace all non-alphanumeric characters with a
>single ' '.  By the looks of things, the only difference is that if
>you had something like 'foo.bar' or 'foo&bar', your code would leave
>that as one word, whereas using the regex would convert it into two
>words.

Well, I'll have to learn the re module first. But I will.

>If you want to keep the meaning of your code intact, you could still
>use a regex to do it.  Something like (untested)
>re.sub('\b\W+|\W+\b|-+|/+', ' ') might work.
>
>----
># Remove all empty elements of L, if any
>while "" in L:
>     L.remove("")
>
>for e in saveRemovedForLaterL:
>     L.append(e)
>
>F = []
>
>for word in L:
>     k = L.count(word)
>     if (k,word) not in F:
>         F.append((k,word))
>----
>
>There are a lot of hidden loops in here:
>
>1. '' in L
>This will look at every element of L, until it finds "" or it gets to 
>the end.
>2. L.count(word)
>This will also look at every element of L.
>
>If you combine your loops into one, you should be able to save a lot of 
>time.
>
>eg:
>
>for e in saveRemovedForLaterL:
>     L.append(e)
>
>counts = {}
>for word in L:
>     if not word:      # This skips empty words.
>         continue
>     try:
>         counts[word] += 1
>     except KeyError:
>         counts[word] = 1
>F = [(count, word) for word, count in counts.iteritems()]

Things there I don't understand yet, I'm afraid. But I'll get to them.

Thanks for pushing me, John.

Dick




More information about the Tutor mailing list