[Python-ideas] Additions to collections.Counter and a Counter derived class

Wed Mar 15 13:39:58 EDT 2017

On Tue, Mar 14, 2017 at 08:52:52AM -0700, David Mertz wrote:

> But I can imagine an occasional need to, e.g. "find outliers."  However,
> that is not hard to spell as `mycounter.most_common()[-1*N:]`.  Or if your
> program does this often, write a utility function `find_outliers(...)`

That's not how you find outliers :-)

Just because a data point is uncommon doesn't mean it is an outlier. 

I don't think there's any good reason to want to find the "least common" 
values in a statistics context, but there might be other use-cases for 
it. For example, suppose we are interested in the *least* popular 
products being sold:

Counter(order.item for order in orders)

We can get the best selling products easily, but not the duds that don't 
sell much at all.

However, the problem is that what we really need to see is the items 
that don't sell at all (count=0), and they won't show up! So I think 
that this is not actually a useful feature.

> 2) Undefined behavior when using Counter.most_common:
> > 'c', 'c']), when calling c.most_common(3), there are more than 3 "most
> > common" elements in c and c.most_common(3) will not always return the
> > same list, since there is no defined total order on the elements in c.
> >
> Should this be mentioned in the documentation?
> >
> 
> +1. I'd definitely support adding this point to the documentation.

The docs already say that "Elements with equal counts are ordered 
arbitrarily" so I'm not sure what more is needed.

-- 
Steve