quick question

Mon Nov 18 20:28:20 EST 2002

>>>>> "Cousin" == Cousin Stanley <CousinStanley at HotMail.com> writes:

    Cousin> I think 5 seconds for categorizing and counting a couple
    Cousin> of floppy disks worth of votes ain't too bad ...

True...

Since you are interested in performance, you might be able to get an
improvement by processing the file at once, rather than in a line by
line loop.

The built-in function map often provides substantial performance gains
over a manual loop.  How does something like the following compare:

from __future__ import division

# xreadlines will split the file by newlines.  votes is now a list of
# favorite ice creams.  
votes = file('votes.dat').xreadlines()

favorites = {}
# We need a custom function to pass to map
def add_vote(v):
    favorites[v] = favorites.get(v,0) + 1

# Use map rather than a built in loop.  You add the cost of the function
# calls, but lose the cost of a python loop.  Who wins?  My guess is map
map(add_vote, votes)

# You don't need to keep a separate tally.  reduce on the values will
# do it very fast
total = reduce(int.__add__, favorites.values())

# The rest is the same as before.  Use your fancy output format instead....
votes = [(count, flavor) for (flavor, count) in favorites.items()]
votes.sort()
votes.reverse()  # winners first!

for (count, flavor) in votes:
    print '%s got %d votes (%%%1.2f)' % (flavor, count, 100.0*count/total)

Anxiously awaiting your results ....

John Hunter