Sorting distionary by value

Jim Dennis jimd at vega.starshine.org
Wed Mar 27 22:18:25 EST 2002


In article <slrna9m5sc.ai9.arturs at aph.waw.pdi.net>, Artur Skura wrote:

>Duncan Booth wrote:
>> Artur Skura <arturs at iidea.pl> wrote in 
>> news:slrna9lqj1.9n1.arturs at aph.waw.pdi.net:
>>> Is there an idiom in Python as to sorting dictionary by value,
>>> not keys? I came up with some solutions which are so inefficient
>>> that I'm sure there must be a simple way.
>> How do you know they are inefficient? Have you profiled your application 
>> and found this to be a bottleneck?

>No, and it seems the problem is not with sorting.
>I wanted to write a compact word counting script (well, in shell
>it can be done in a 5 lines or so), just  for fun.


 Just about a week ago I posted a word frequency counting 
 script which counted "words," filtered out some common 
 contractions and "non-words" and tracked "known words" (as 
 per entries from /usr/share/dict/words) and then generated
 listing by highest frequency first.  I also posted a modified
 version that would shove its results into a PostgreSQL database
 table (a couple of days later, it only took four lines).

 I could mail it to you if you like, but I'd be surprised if
 it's not still floating around.

 (BTW: for performance, it handles almost 1800 man pages, 
 averaging 7Kb each, in less than 2 minutes on my mid-range 
 (dual 650Mhz Pentium) desktop box)).

 The whole thing in only about 85 lines long and the core
 function is less than ten.  As so many people have suggested
 in this thread, it simply uses a dictionary (awk calls them
 associative arrays, perl calls them "hashes").

 The core loop is something like:

	freq = {}
 	for line in file:
		for word in line.split():
			if word in freq:	freq[word] += 1
			else:				freq[word] = 1

 (assuming Python2.2 for file interation and dictionary membership
 support using "in."  I *really* like those 2.2 features!  They make
 my psuedo-code so executable!)




More information about the Python-list mailing list