[Python-ideas] collections.Counter should implement __mul__, __rmul__

Peter Norvig peter at norvig.com
Sun Apr 15 20:44:19 EDT 2018

If you think of a Counter as a multiset, then it should support __or__, not
__add__, right?

I do think it would have been fine if Counter did not support "+" at all
(and/or if Counter was limited to integer values). But  given where we are
now, it feels like we should preserve `c + c == 2 * c`.

As to the "doesn't really add any new capabilities" argument, that's true,
but it is also true for Counter as a whole: it doesn't add much over
defaultdict(int), but it is certainly convenient to have a standard way to
do what it does.

I agree with your intuition that low level is better. `total` would be
useful. If you have total and mul, then as you and others have pointed out,
normalize is just c *= 1/c.total.

I can also see the argument for a new FrequencyTable class in the
statistics module. (By the way, I refactored my
https://github.com/norvig/pytudes/blob/master/ipynb/Probability.ipynb a
bit, and now I no longer need a `normalize` function.)

On Sun, Apr 15, 2018 at 5:06 PM Raymond Hettinger <
raymond.hettinger at gmail.com> wrote:

> > On Apr 15, 2018, at 2:05 PM, Peter Norvig <peter at norvig.com> wrote:
> >
> > For most types that implement __add__, `x + x` is equal to `2 * x`.
> >
> > ...
> >
> >
> > That is true for all numbers, list, tuple, str, timedelta, etc. -- but
> not for collections.Counter. I can add two Counters, but I can't multiply
> one by a scalar. That seems like an oversight.
> If you view the Counter as a sparse associative array of numeric values,
> it does seem like an oversight.  If you view the Counter as a Multiset or
> Bag, it doesn't make sense at all ;-)
> From an implementation point of view, Counter is just a kind of dict that
> has a __missing__() method that returns zero.  That makes it trivially easy
> to subclass Counter to add new functionality or just use dictionary
> comprehensions for bulk updates.
> >
> >
> > It would be worthwhile to implement multiplication because, among other
> reasons, Counters are a nice representation for discrete probability
> distributions, for which multiplication is an even more fundamental
> operation than addition.
> There is an open issue on this topic.  See:
> https://bugs.python.org/issue25478
> One stumbling point is that a number of commenters are fiercely opposed to
> non-integer uses of Counter. Also, some of the use cases (such as those
> found in Allen Downey's "Think Stats" and "Think Bayes" books) also need
> division and rescaling to a total (i.e. normalizing the total to 1.0) for a
> probability mass function.
> If the idea were to go forward, it still isn't clear whether the correct
> API should be low level (__mul__ and __div__ and a "total" property) or
> higher level (such as a normalize() or rescale() method that produces a new
> Counter instance).  The low level approach has the advantage that it is
> simple to understand and that it feels like a logical extension of the
> __add__ and __sub__ methods.  The downside is that doesn't really add any
> new capabilities (being just short-cuts for a simple dict comprehension or
> call to c.values()).  And, it starts to feature creep the Counter class
> further away from its core mission of counting and ventures into the realm
> of generic sparse arrays with numeric values.  There is also a
> learnability/intelligibility issue in __add__ and __sub__ correspond to
> "elementwise" operations while  __mul__ and __div__ would be "scalar
> broadcast" operations.
> Peter, I'm really glad you chimed in.  My advocacy lacked sufficient
> weight to move this idea forward.
> Raymond
