[Python-ideas] statistics module in Python3.4

Fri Jan 31 02:27:05 CET 2014

On Thu, Jan 30, 2014 at 11:03:38AM -0800, Larry Hastings wrote:
> On Mon, Jan 27, 2014 at 9:41 AM, Wolfgang 
> <wolfgang.maier at biologie.uni-freiburg.de 
> <mailto:wolfgang.maier at biologie.uni-freiburg.de>> wrote:
> >I think a much cleaner (and probably faster) implementation would be 
> >to gather first all the types in the input sequence, then decide what 
> >to return in an input order independent way.
> 
> I'm willing to consider this a "bug fix".  And since it's a new function 
> in 3.4, we don't have an installed base.  So I'm willing to consider 
> fixing this for 3.4.

I'm hesitant to require two passes over the data in _sum. Some 
higher-order statistics like variance are currently implemented using 
two passes, but ultimately I've like to support single-pass algorithms 
that can operate on large but finite iterators.

But I will consider it as an option.

I'm also hesitant to make the promise that _sum will be 
order-independent. Addition in Python isn't:

py> class A(int):
...     def __add__(self, other):
...             return type(self)(super().__add__(other))
...     def __repr__(self):
...             return "%s(%d)" % (type(self).__name__, self)
...
py> class B(A):
...     pass
...
py> A(1) + B(1)
A(2)
py> B(1) + A(1)
B(2)

[...]
> Yes, exactly.  If the support for Counter is half-baked, let's prevent 
> it from being used now.

I strongly disagree with this. Counters are currently treated the same 
as any other iterable, and built-in sum and math.fsum don't treat them 
specially:

py> from collections import Counter
py> c = Counter([1, 1, 1, 1, 1, 2])
py> c
Counter({1: 5, 2: 1})
py> sum(c)
3
py> from math import fsum
py> fsum(c)
3.0

If you're worried about people coming to rely on this, and thus running 
into trouble in the future if Counters get treated specially for (say) 
weighted data, then I'd accept a warning in the docs, or even a runtime 
warning. But not an exception.

-- 
Steven