[Tutor] Word count help

Fabrizio facelle@tiscalinet.it
Mon, 29 Jan 2001 22:39:28 +0100


Hello,

I am a newbie and I am trying to write a simple program that counts how many
times each word in a text file appears in the text itself.

It reads a line from file, converts it into a list, then counts it saving
results to be added to those of next lines.
The program returns two list: one contains the all the words that appears in
the text, and the second one contains the associated frequencies of each
word.

See the attached script (I hope there are not mistakes, since I cut & pasted
it and translated all function and variable names etc. from Italian... ).

It seems to work fine, but it is very slow, and when working on large .txt
files (500 Kb or more) it takes several hours (!) to finish.
Is there any way to improve it and making it faster ?

Thanks in advance,

Fabrizio C.



-----------------

class TextProcess :

    [......]


    def Wordcount (self, testo, tablett, freq):

        import string

        dic =[]

        for r in  testo:

            if tablett.count(r)== 0 :

                tablett.append(r)
                k = testo.count(r)
                freq.append(k)

                dic.append(r)

            else :

                ind = tablett.index (r)
                if dic.count(r)== 0:
                    k = testo.count (r)
                    freq [ind] = freq[ind] + k
                    dic.append(r)

        return tablett, freq


#-------------------------


class WordProcess(TextProcess):

    def operate (self, testo, tablett1, freq1):

        [....]

        tablett, freq = self.Wordcount(testo, tablett1, freq1)

        return tablett, freq


#--------------------------


 def ContaParole(self, event):

        tablett1, freq1, tablett2, freq2 =[], [], [],[]

        testo = open('text.txt, 'r')

        import string

        object = WordProcess()

        [.....]

        for text in testo.readlines():

            par = string.split(text)
            tablett1, freq1 = object.operate(par, tablett2, freq2)

        file.close()