[Tutor] Please comment on this code...WORD FREQUENCY COUNTER

John Fouhy john at fouhy.net
Mon Oct 23 23:22:28 CEST 2006


On 24/10/06, Asrarahmed Kadri <ajkadri at googlemail.com> wrote:
> I have written a program which calculates teh frequency of each word that
> appears in a file. I would like to have your feedback.
> This program only handles .txt files. I want it to handle word documents
> also. How to accomplish this ..???

Dealing with word documents will be tricky.  You will probably have to
use pythonwin to control Microsoft Word.

>  fname = raw_input("Enter the file name to be opened")
>
> fd = open(fname, 'r')
>
> done = 0
> dict1 = {}               # create an empty dictionary to hold word as a key
> and its freq. as the value
> while not done:
>     aLine = fd.readline()
>     if aLine != "" :

What happens if aLine is (for example) "   "?


> fd.seek(0,0)                                          # go
> to the beginning of the file

Ok, you don't need to go through the file twice.  The normal idiom
would be something like:

try:
    dict1[item] += 1
except KeyError:
    dict1[item] = 1

That will allow you to do the whole thing in just one pass through the file.

Hmm, and you are stopping on the first empty line, even if there is
more text below.  You can use a for loop to iterate through the lines
of a file:

textfile = open(fname, 'r')
for line in textfile:
    # process line

You might also want to think about giving your variables better names.  Eg:
  instead of 'dict1', say 'frequencies'
  instead of 'item' say 'word'

So you could end up with:

textfile = open(fname, 'r')
for line in textfile:
    for word in line.split():
        # do something with word

HTH!

-- 
John.


More information about the Tutor mailing list