[Tutor] Tutor Digest, Vol 166, Issue 21
Peter Otten
__peter__ at web.de
Thu Dec 28 15:51:31 EST 2017
anish singh wrote:
>>> However, I am stuck. I have below code which is not working.
>
> I don't know how to achieve this programmatically: sorted by the
> number of occurrences in a descending order. If two or more words
> have the same count, they should be sorted
> alphabetically (in an ascending order).
>>> document = "Practice makes perfect, you'll get perfecT by practice. just
practice! just just just!!"
>>> words = ("".join(c for c in word if "a" <= c <= "z") for word in
document.lower().split())
>>> freq = collections.Counter(words)
>>> freq
Counter({'just': 4, 'practice': 3, 'perfect': 2, 'by': 1, 'get': 1, 'makes':
1, 'youll': 1})
Given that Counter or a similar dict you can first sort by word and then by
word frequency:
>>> pairs = sorted(freq.items()) # sort alphabetically
>>> pairs.sort(key=lambda pair: pair[1], reverse=True) # sort by frequency
>>> pairs
[('just', 4), ('practice', 3), ('perfect', 2), ('by', 1), ('get', 1),
('makes', 1), ('youll', 1)]
This works because Python's sorting algorithm is "stable", i. e. values with
the same key stay in the same relative order as before the sorting.
While you can also achieve that with a single sorted() call
>>> sorted(freq.items(), key=lambda p: (-p[1], p[0]))
[('just', 4), ('practice', 3), ('perfect', 2), ('by', 1), ('get', 1),
('makes', 1), ('youll', 1)]
the first method is usually clearer.
PS: Both approaches also work with comparison functions, e. g.
>>> def cmp_freqs((w1, f1), (w2, f2)):
... return -cmp(f1, f2) or cmp(w1, w2)
...
>>> sorted(freqs.iteritems(), cmp_freqs)
[('just', 4), ('practice', 3), ('perfect', 2), ('by', 1), ('get', 1),
('makes', 1), ('youll', 1)]
but this is
(1) usually less efficient
(2) limited to Python 2
so I can't recommend the cmp-based solution.
More information about the Tutor
mailing list