[Python-ideas] collections.Counter should implement __mul__, __rmul__

Raymond Hettinger raymond.hettinger at gmail.com
Sun Apr 15 20:05:55 EDT 2018



> On Apr 15, 2018, at 2:05 PM, Peter Norvig <peter at norvig.com> wrote:
> 
> For most types that implement __add__, `x + x` is equal to `2 * x`. 
> 
> ... 
> 
> 
> That is true for all numbers, list, tuple, str, timedelta, etc. -- but not for collections.Counter. I can add two Counters, but I can't multiply one by a scalar. That seems like an oversight. 

If you view the Counter as a sparse associative array of numeric values, it does seem like an oversight.  If you view the Counter as a Multiset or Bag, it doesn't make sense at all ;-)

From an implementation point of view, Counter is just a kind of dict that has a __missing__() method that returns zero.  That makes it trivially easy to subclass Counter to add new functionality or just use dictionary comprehensions for bulk updates.

>  
> 
> It would be worthwhile to implement multiplication because, among other reasons, Counters are a nice representation for discrete probability distributions, for which multiplication is an even more fundamental operation than addition.

There is an open issue on this topic.  See:  https://bugs.python.org/issue25478

One stumbling point is that a number of commenters are fiercely opposed to non-integer uses of Counter. Also, some of the use cases (such as those found in Allen Downey's "Think Stats" and "Think Bayes" books) also need division and rescaling to a total (i.e. normalizing the total to 1.0) for a probability mass function.

If the idea were to go forward, it still isn't clear whether the correct API should be low level (__mul__ and __div__ and a "total" property) or higher level (such as a normalize() or rescale() method that produces a new Counter instance).  The low level approach has the advantage that it is simple to understand and that it feels like a logical extension of the __add__ and __sub__ methods.  The downside is that doesn't really add any new capabilities (being just short-cuts for a simple dict comprehension or call to c.values()).  And, it starts to feature creep the Counter class further away from its core mission of counting and ventures into the realm of generic sparse arrays with numeric values.  There is also a learnability/intelligibility issue in __add__ and __sub__ correspond to "elementwise" operations while  __mul__ and __div__ would be "scalar broadcast" operations.

Peter, I'm really glad you chimed in.  My advocacy lacked sufficient weight to move this idea forward.


Raymond





More information about the Python-ideas mailing list