Learning Python via a little word frequency program

Fredrik Lundh fredrik at pythonware.com
Wed Jan 9 06:33:56 EST 2008


Andrew Savige wrote:


> Here's my first attempt:
> 
> names = "freddy fred bill jock kevin andrew kevin kevin jock"
> freq = {}
> for name in names.split():
>     freq[name] = 1 + freq.get(name, 0)
> deco = zip([-x for x in freq.values()], freq.keys())
> deco.sort()
> for v, k in deco:
>     print "%-10s: %d" % (k, -v)
> 
> I'm interested to learn how more experienced Python folks would solve
> this little problem. Though I've read about the DSU Python sorting idiom,
> I'm not sure I've strictly applied it above ... and the -x hack above to
> achieve a descending sort feels a bit odd to me, though I couldn't think
> of a better way to do it.

sort takes a reverse flag in recent versions, so you can do a reverse 
sort as:

    deco.sort(reverse=True)

in older versions, just do:

    deco.sort()
    deco.reverse() # this is fast!

also note that recent versions also provide a "sorted" function that 
returns the sorted list, and both "sort" and "sorted" now allow you to 
pass in a "key" function that's used to generate a sort key for each 
item.  taking that into account, you can simply write:

    # sort items on descending count
    deco = sorted(freq.items(), key=lambda x: -x[1])

simplifying the print statement is left as an exercise.

> I also have a few specific questions. Instead of:
> 
> for name in names.split():
>     freq[name] = 1 + freq.get(name, 0)
> 
> I might try:
> 
> for name in names.split():
>     try:
>         freq[name] += 1
>     except KeyError:
>         freq[name] = 1
> 
> Which is preferred?

for simple scripts and small datasets, always the former.

for performance-critical production code, it depends on how often you 
expect "name" to be present in the dictionary (setting up a try/except 
is cheap, but raising and catching one is relatively costly).

> Ditto for:
> 
> deco = zip([-x for x in freq.values()], freq.keys())
> 
> versus:
> 
> deco = zip(map(operator.neg, freq.values()), freq.keys())

using zip/keys/values to emulate items is a bit questionable.  if you 
need to restructure the contents of a dictionary, I usually prefer items 
(or iteritems, where suitable) and tuple indexing/unpacking in a list 
comprehension (or generator expression, where suitable).

> Finally, I might replace:
> 
> for v, k in deco:
>     print "%-10s: %d" % (k, -v)
> 
> with:
> 
> print "\n".join("%-10s: %d" % (k, -v) for v, k in deco)

why?

</F>




More information about the Python-list mailing list