Extending collections.Counter with top_n() to return elements by rank
collections.Counter has most_common([n]) method which returns the most common n elements of the counter, but in case of a tie the result is unspecified  whereas in practice the order of insertion breaks the tie. For example: >>> Counter(["a","a","b","a","b","c","c","d"]).most_common(2) [('a', 3), ('b', 2)] >>> Counter(["a","a","c","a","b","b","c","d"]).most_common(2) [('a', 3), ('c', 2)] In some cases (which I believe are not rare) you would like to break the tie yourself or get the top elements by *rank*. Using our example: Rank Elements 0 {"a"} 1 {"b", "c"} 2 {"d"} I propose a new method top_n(n) that returns the top elements in the first n ranks. For example: >>> Counter(["a","a","b","a","b","c","c","d"]).top_n(0) [('a', 3)] >>> Counter(["a","a","b","a","b","c","c","d"]).top_n(1) [('a', 3), ('b', 2), ('c', 2)] >>> Counter(["a","a","b","a","b","c","c","d"]).top_n(2) [('a', 3), ('b', 2), ('c', 2), ('d', 1)] >>> Counter(["a","a","b","a","b","c","c","d"]).top_n(99) [('a', 3), ('b', 2), ('c', 2), ('d', 1)] >>> Counter(["a","a","b","a","b","c","c","d"]).top_n(1) [] Some points to discuss: * What the return type should be? A list of tuples like most_common() or List[Tuple[int, List[T]] that conveys the rank information too? Each tuple is a rank, whose first element is the frequency and second element is the list of elements. E.g. [(3, ['a']), (2, ['b', 'c']), (1, ['d'])] * Rank starts at 0 or 1? * Shall negative numbers raise an exception or return an empty list like most_common()? I would love to hear your opinion on this, and if there is interest, I am happy to try implement it too. Regards, Bora M. Alper https://boramalper.org/
participants (1)

Bora Alper