[Tutor] Word count help

Jose Amoreira amoreira@mercury.ubi.pt
Tue, 30 Jan 2001 18:20:52 +0000

I haven't really understood everything in your code, so maybe I'll be talking
nonsense.  Anyway, IMHO, your code can be made more readable if you use one
dictionary instead of two lists.  I'm not certain that it gets any faster, but
using dictionaries, I'd code function Wordcount like this:

def Wordcount(testo,freqs):
    # freqs is a dictionary with words as keys and their frequencies as
    # values; testo is a part of the input file(I'm trying to keep your code
    # structure, but removed the OO because I didn't understand it
    for word in test.split():
        if word in freqs.keys():
            freqs[word] += 1
            freqs[word] = 1
    return freqs

But I really don't like this coming and going back and forth with the freqs
dictionary, or, in your code, with the two lists tablett and freq. I'd rather
open the file in the very function that computes the frequencies, or else send
that function the hole text at once, instead of constantly updating the
counting with partial results from each line stored in the string testo.

Another possible thing to watch is that if you want to open big files it is
probabbly better not to read the hole file at once using readlines. Instead,
use readline() method wich reads one line at a time. This doesn't neccesarily
mean more time because if you are short on memory, reading the file at once may

force the computer to use virtual memory (disk) wich is a lot slower.

Once again, I don't know if using dictionaries makes the code faster for large
files. It probably won't. But you must take into account that a 500Kb contains
quite a lot of words. As it runs, the code must store a few thousand words and,

for each new word read, must make *a lot* of checks to see if it has already
been entered in the lists, or dictionary, or whatever. My opinion is that this
is a lot of work even for a fast computer, and python is not (and it doesn't
pretend to be) a fully compiled language like C/C++ or fortran...

I hope this helps!
So long
Ze Amoreira