[Tutor] Tutor Digest, Vol 166, Issue 21

Thu Dec 28 15:51:31 EST 2017

anish singh wrote:

>>> However, I am stuck. I have below code which is not working.
> 
> I don't know how to achieve this programmatically: sorted by the
> number of occurrences in a descending order. If two or more words
> have the same count, they should be sorted
> alphabetically (in an ascending order).

>>> document = "Practice makes perfect, you'll get perfecT by practice. just 
practice! just just just!!"
>>> words = ("".join(c for c in word if "a" <= c <= "z") for word in 
document.lower().split())
>>> freq = collections.Counter(words)
>>> freq
Counter({'just': 4, 'practice': 3, 'perfect': 2, 'by': 1, 'get': 1, 'makes': 
1, 'youll': 1})

Given that Counter or a similar dict you can first sort by word and then by 
word frequency:

>>> pairs = sorted(freq.items()) # sort alphabetically
>>> pairs.sort(key=lambda pair: pair[1], reverse=True) # sort by frequency
>>> pairs
[('just', 4), ('practice', 3), ('perfect', 2), ('by', 1), ('get', 1), 
('makes', 1), ('youll', 1)]

This works because Python's sorting algorithm is "stable", i. e. values with 
the same key stay in the same relative order as before the sorting.

While you can also achieve that with a single sorted() call

>>> sorted(freq.items(), key=lambda p: (-p[1], p[0]))
[('just', 4), ('practice', 3), ('perfect', 2), ('by', 1), ('get', 1), 
('makes', 1), ('youll', 1)]

the first method is usually clearer.

PS: Both approaches also work with comparison functions, e. g.

>>> def cmp_freqs((w1, f1), (w2, f2)):
...     return -cmp(f1, f2) or cmp(w1, w2)
... 
>>> sorted(freqs.iteritems(), cmp_freqs)
[('just', 4), ('practice', 3), ('perfect', 2), ('by', 1), ('get', 1), 
('makes', 1), ('youll', 1)]

but this is

(1) usually less efficient
(2) limited to Python 2

so I can't recommend the cmp-based solution.