[Python-ideas] Additions to collections.Counter and a Counter derived class

Tue Mar 14 05:38:06 EDT 2017

Hi all,

I have been using the Counter class recently and came across several
things that I was hoping to get feedback on. (This is my first time
mailing this list, so any advice is greatly appreciated)

1) Addition of a Counter.least_common method:

This would add a method to Counter that is basically the opposite of
the pre-existing Counter.most_common method. In this case, the least
common elements are considered the elements in c with the lowest
(non-zero) frequency.

This was addressed in https://bugs.python.org/issue16994, but it was
never resolved and is still open (since Jan. 2013). This is a small
change, but I think that it is useful to include in the stdlib. I have
written a patch for this, but have not submitted a PR yet. It can be
found at https://github.com/mcognetta/cpython/tree/collections_counter_least_common

2) Undefined behavior when using Counter.most_common:

Consider the case c = Counter([1, 1, 2, 2, 3, 3, 'a', 'a', 'b', 'b',
'c', 'c']), when calling c.most_common(3), there are more than 3 "most
common" elements in c and c.most_common(3) will not always return the
same list, since there is no defined total order on the elements in c.
Should this be mentioned in the documentation?
Additionally, perhaps there is room for a method that produces all of
the elements with the n highest frequencies in order of their
frequencies. For example, in the case of c = Counter([1, 1, 1, 2, 2,
3, 3, 4, 4, 5]) c.aforementioned_method(2) would return [(1, 3), (2,
2), (3, 2), (4, 2)] since the two highest frequencies are 3 and 2.

3) Addition of a collections.Frequency or collections.Proportion class
derived from collections.Counter:

This is sort of discussed in https://bugs.python.org/issue25478.
The idea behind this would be a dictionary that, instead of returning
the integer frequency of an element, would return it's proportional
representation in the iterable.

So, for example f = Frequency('aabbcc'), f would hold Frequency({'a':
0.3333333333333333, 'b': 0.3333333333333333, 'c':
0.3333333333333333}).

To address

>The pitfall I imagine here is that if you continue adding elements after normalize() is called, the >results will be nonsensical.

from the issue, this would not be a problem because we could just
build it entirely on top of a Counter, keep a count of the total
number of elements in the Counter, and just divide by that every time
we output or return the object or any of its elements.

I think that this would be a pretty useful addition especially for
code related to discrete probability distributions (which is what
motivated this in the first place).

Thanks in advance,

-Marco