[Python-ideas] statistics module in Python3.4
Steven D'Aprano
steve at pearwood.info
Fri Jan 31 02:27:05 CET 2014
On Thu, Jan 30, 2014 at 11:03:38AM -0800, Larry Hastings wrote:
> On Mon, Jan 27, 2014 at 9:41 AM, Wolfgang
> <wolfgang.maier at biologie.uni-freiburg.de
> <mailto:wolfgang.maier at biologie.uni-freiburg.de>> wrote:
> >I think a much cleaner (and probably faster) implementation would be
> >to gather first all the types in the input sequence, then decide what
> >to return in an input order independent way.
>
> I'm willing to consider this a "bug fix". And since it's a new function
> in 3.4, we don't have an installed base. So I'm willing to consider
> fixing this for 3.4.
I'm hesitant to require two passes over the data in _sum. Some
higher-order statistics like variance are currently implemented using
two passes, but ultimately I've like to support single-pass algorithms
that can operate on large but finite iterators.
But I will consider it as an option.
I'm also hesitant to make the promise that _sum will be
order-independent. Addition in Python isn't:
py> class A(int):
... def __add__(self, other):
... return type(self)(super().__add__(other))
... def __repr__(self):
... return "%s(%d)" % (type(self).__name__, self)
...
py> class B(A):
... pass
...
py> A(1) + B(1)
A(2)
py> B(1) + A(1)
B(2)
[...]
> Yes, exactly. If the support for Counter is half-baked, let's prevent
> it from being used now.
I strongly disagree with this. Counters are currently treated the same
as any other iterable, and built-in sum and math.fsum don't treat them
specially:
py> from collections import Counter
py> c = Counter([1, 1, 1, 1, 1, 2])
py> c
Counter({1: 5, 2: 1})
py> sum(c)
3
py> from math import fsum
py> fsum(c)
3.0
If you're worried about people coming to rely on this, and thus running
into trouble in the future if Counters get treated specially for (say)
weighted data, then I'd accept a warning in the docs, or even a runtime
warning. But not an exception.
--
Steven
More information about the Python-ideas
mailing list