[Python-ideas] statistics module in Python3.4

Larry Hastings larry at hastings.org
Fri Jan 31 05:58:20 CET 2014


On 01/30/2014 05:27 PM, Steven D'Aprano wrote:
> I'm hesitant to require two passes over the data in _sum. Some
> higher-order statistics like variance are currently implemented using
> two passes, but ultimately I've like to support single-pass algorithms
> that can operate on large but finite iterators.
>
> But I will consider it as an option.
>
> I'm also hesitant to make the promise that _sum will be
> order-independent. Addition in Python isn't: [...]

I concede that this is mostly outside my expertise, and the statistics 
module and the PEP were your doing.  So you're the expert here and I 
will defer to you.

But.  My dim understanding of the *whole point* of the new statistics 
module was that it valued correctness over raw performance.  I assumed 
sorting values from small to large** before summing was *exactly* the 
sort of thing it was written to do.  If all we wanted were Python's 
existing semantics, why bother writing statistics._sum() in the first 
place?  Just use sum().

On the other hand, I had missed the fact that this was an internal-only 
method.  If changing _statistics._sum so it reordered the iterable to 
preserve correctness wouldn't change the behavior of any supported 
external APIs, then obviously there's no need, and I'd prefer to leave 
it alone for 3.4.  If you decided to change it for 3.5 and people were 
relying on its old behavior, that would be on them.  (Though a comment 
saying "I might change this later" would be welcome... if true.)


> If you're worried about people coming to rely on this, and thus running
> into trouble in the future if Counters get treated specially for (say)
> weighted data, then I'd accept a warning in the docs, or even a runtime
> warning. But not an exception.

The statistics module isn't marked as provisional.  So the semantics 
that ship with 3.4 are going to be set in stone.  Changing them later 
simply won't be an option--that will break code.  If you want to treat 
Counter objects differently in the future than you do now, then I agree 
with Wolfgang: the best course of action would be to add an exception 
now.  But again I'll defer to your judgment about what's best for your 
module.


//arry/

** Or high-precision to low-precision.  You know what I mean, the 
classic "if you add large numbers first you throw away precision and can 
wind up with a different result" thing.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140130/2a7bbf0c/attachment.html>


More information about the Python-ideas mailing list