Extending collections.Counter with top_n() to return elements by rank
Bora
bora at boramalper.org
Sun Nov 1 04:56:46 EST 2020
collections.Counter has most_common([n]) method which returns the most
common n elements of the counter, but in case of a tie the result is
unspecified --- whereas in practice the order of insertion breaks the
tie. For example:
>>> Counter(["a","a","b","a","b","c","c","d"]).most_common(2)
[('a', 3), ('b', 2)]
>>> Counter(["a","a","c","a","b","b","c","d"]).most_common(2)
[('a', 3), ('c', 2)]
In some cases (which I believe are not rare) you would like to break
the tie yourself or get the top elements by *rank*. Using our example:
Rank Elements
0 {"a"}
1 {"b", "c"}
2 {"d"}
I propose a new method top_n(n) that returns the top elements in the
first n ranks. For example:
>>> Counter(["a","a","b","a","b","c","c","d"]).top_n(0)
[('a', 3)]
>>> Counter(["a","a","b","a","b","c","c","d"]).top_n(1)
[('a', 3), ('b', 2), ('c', 2)]
>>> Counter(["a","a","b","a","b","c","c","d"]).top_n(2)
[('a', 3), ('b', 2), ('c', 2), ('d', 1)]
>>> Counter(["a","a","b","a","b","c","c","d"]).top_n(99)
[('a', 3), ('b', 2), ('c', 2), ('d', 1)]
>>> Counter(["a","a","b","a","b","c","c","d"]).top_n(-1)
[]
Some points to discuss:
* What the return type should be? A list of tuples like most_common()
or List[Tuple[int, List[T]] that conveys the rank information too?
Each tuple is a rank, whose first element is the frequency and
second element is the list of elements. E.g. [(3, ['a']), (2, ['b',
'c']), (1, ['d'])]
* Rank starts at 0 or 1?
* Shall negative numbers raise an exception or return an empty list
like most_common()?
I would love to hear your opinion on this, and if there is interest, I
am happy to implement it too.
Regards,
Bora M. Alper
https://boramalper.org/
More information about the Python-list
mailing list