[Tutor] Please look at my wordFrequency.py
Dick Moores
rdm at rcblue.com
Tue Oct 11 10:43:44 CEST 2005
John Fouhy wrote at 14:47 10/10/2005:
>Some comments:
>
>----
>textAsString = input.read()
>
>S = ""
>for c in textAsString:
> if c == "\n":
> S += ' '
> else:
> S += c
>----
>
>You could write this more concisely as:
>
>S = textAsString.replace('\n', ' ')
Yes! Thanks. That should have occurred to me.
>----
># At this point, each element ("word" in code below) of L is
># a string containing a real word such as "dog",
># where "dog" may be prefixed and/or suffixed by strings of
># non-alphanumeric characters. So, for example, word could be "'dog?!".
># The following code first strips these prefixed or suffixed
>non-alphanumeric
># characters and then finds any words with dashes ("--") or forward
>slashes ("/"),
># such as in "and/or". These then become 2 or more words without the
># dashes or slashes.
>----
>
>What about using regular expressions?
>
>re.sub('\W+', ' ') will replace all non-alphanumeric characters with a
>single ' '. By the looks of things, the only difference is that if
>you had something like 'foo.bar' or 'foo&bar', your code would leave
>that as one word, whereas using the regex would convert it into two
>words.
Well, I'll have to learn the re module first. But I will.
>If you want to keep the meaning of your code intact, you could still
>use a regex to do it. Something like (untested)
>re.sub('\b\W+|\W+\b|-+|/+', ' ') might work.
>
>----
># Remove all empty elements of L, if any
>while "" in L:
> L.remove("")
>
>for e in saveRemovedForLaterL:
> L.append(e)
>
>F = []
>
>for word in L:
> k = L.count(word)
> if (k,word) not in F:
> F.append((k,word))
>----
>
>There are a lot of hidden loops in here:
>
>1. '' in L
>This will look at every element of L, until it finds "" or it gets to
>the end.
>2. L.count(word)
>This will also look at every element of L.
>
>If you combine your loops into one, you should be able to save a lot of
>time.
>
>eg:
>
>for e in saveRemovedForLaterL:
> L.append(e)
>
>counts = {}
>for word in L:
> if not word: # This skips empty words.
> continue
> try:
> counts[word] += 1
> except KeyError:
> counts[word] = 1
>F = [(count, word) for word, count in counts.iteritems()]
Things there I don't understand yet, I'm afraid. But I'll get to them.
Thanks for pushing me, John.
Dick
More information about the Tutor
mailing list