Issue in printing top 20 dictionary items by dictionary value
Peter Otten
__peter__ at web.de
Sat Oct 4 06:20:07 EDT 2014
Shiva wrote:
> Hi All,
>
> I have written a function that
> -reads a file
> -splits the words and stores it in a dictionary as word(key) and the total
> count of word in file (value).
>
> I want to print the words with top 20 occurrences in the file in reverse
> order - but can't figure it out. Here is my function:
>
> def print_top(filename):
>
> #Open a file
> path = '/home/BCA/Documents/LearnPython/ic/'
> fname = path + filename
> print ('filename: ',fname)
> filetext = open(fname)
>
> #Read the file
> textstorage={}
>
> #print(type(textstorage))
> readall = filetext.read().lower()
> eachword = set(readall.split())
>
> #store split words as keys in dictionary
> for w in eachword:
> textstorage[w] = readall.count(w)
Using count() here is very inefficient. A better approach is to increment
the dict value:
for w in readall.split():
textstorage[w] = textstorage.get(w, 0) + 1
>
> #print top 20 items in dictionary by decending order of val
> # This bit is what I can't figure out.
>
> for dkey in (textstorage.keys()):
> print(dkey,sorted(textstorage[dkey]))??
Apart from the fact that you are sorting characters in the word at that
point the sorting effort is already too late -- you need to sort the dict
keys by the corresponding dict values.
It is possible to write a get_value() function such that
sorted(textstorage, key=get_value, reverse=True)
gives the keys in the right order, but perhaps it is simpler to convert
textstorage into a list of (count, word) pairs first, something like
pairs = [(42, "blue"), (17, "red"), (77, "black"), ...]
When you sort that list
most_common_words = sorted(pairs, reverse=True)
you automatically get (count, word) pairs in the right order and can print
the first 20 with
for count, word in most_common_words[:20]:
print(word, count)
PS: Once you have it all working have a look at collections.Counter...
More information about the Python-list
mailing list