[Tutor] Cheating?No, not me:-)

Wed, 17 Apr 2002 19:54:15 +0200

Am Freitag,  5. April 2002 23:38 schrieb dman:
> On Fri, Apr 05, 2002 at 10:42:58PM +0200, Nicole Seitz wrote:
> | Am Sonntag, 24. M=E4rz 2002 23:01 schrieb dman:
> | > | For people who are interested: here's a similar problem: "Given a
> | > | text file and an integer K, you are to print the K most common wo=
rds
> | > | in the file (and the number of their occurences) in decreasing
> | > | frequency."
> | >
> | > <cheater's hint>
> | > Search the tutor archives.  With only a couple minor modifications =
the
> | > answer is already there.
> | > </cheater's hint>
> |
> | Some hint WHERE in the archives I could find the answer?Month?Year?
>
> I don't remember, but try googling for :
>     python tutor word count slow perl
>
> which yields this as the second result :
>     http://mail.python.org/pipermail/tutor/2001-February/003403.html
>
> apparently it was February 1, 2001.
>
> Enjoy :-).

Thanks!
My program now runs almost perfectly. And I solved the problem how to pri=
nt=20
the K most common words. Maybe there's an easier way to determine the mos=
t=20
common words, I don't know.
Here's the little function that deals with the most common words.What do =
you=20
think of it?

Note: occ is the dictionary where I store the words and their occurences,=
e.g.=20
occ =3D { "hello":3,"you":123,"fool":23}
-------------------------------------------------------------------------=
------
def MostCommonWords(occ,K):=20
    dict =3D{}

    for key in occ.keys():
        if dict.has_key(occ[key]):
            dict[occ[key]].append(key)
        else:
            dict[occ[key]] =3D [key]

    key_list =3D dict.keys()
    key_list.sort()
    key_list.reverse()
    print "Most common word(s): "
    for i in range(int(K)):
        for word in dict[key_list[i]]:
            print "%-8s" % word, "\t",
            #print dict[key_list[ i]],
            print" (occurences: %2i) " % key_list[i]

-------------------------------------------------------------------------=
-------------

Last but not least, I've got some questions on regexes and other stuff wh=
ich=20
you use in your script.

# remove leading and trailing whitespace
        line =3D string.strip(line)

Why that?

       # split the string into a list of words
       # a word is delimited by whitespace or punctuation
        for word in re.split(
         "[" + string.whitespace + string.punctuation + "]+",
         line):

DOn't understand this regex.Could you explain,please?I guess=20
string.punctuation is [.,;:?!].But what's the meaning of [" + bla + bla +=
"] =20
???

# check to make sure the string is considered a word
            if re.match("^[" + string.lowercase + "]+$", word):

Is it necessary to do this.Can't I be sure that thestring is considered a=
=20
word?

Many thanks for your help.

Nicole
>
> -D