[Python-ideas] Pre-PEP: adding a statistics module to Python

Joshua Landau joshua at landau.ws
Sun Aug 4 04:42:45 CEST 2013


On 4 August 2013 03:00, Eli Bendersky <eliben at gmail.com> wrote:

> On Sat, Aug 3, 2013 at 12:47 PM, Alexander Belopolsky
> <alexander.belopolsky at gmail.com> wrote:
> >
> > On Fri, Aug 2, 2013 at 1:45 PM, Steven D'Aprano <steve at pearwood.info>
> wrote:
> >>
> >> I have raised an issue on the tracker to add a statistics module to
> >> Python's standard library:
> >>
> >> http://bugs.python.org/issue18606
> >>
> >> and have been asked to write a PEP. Attached is my draft PEP. Feedback
> is
> >> requested, thanks in advance.
> >
> >
> > The PEP does not mention statistics.sum(), but the reference
> implementation
> > includes it.  I am not sure stdlib needs the third sum function after
> > builtins.sum and math.fsum.  I think it will be better to improve
> > builtins.sum instead.
>
> While I'm somewhat -0.5 on the general idea of the statistics module
> (competing with well-established, super-optimized and
> by-themselves-famous numeric libraries Python has does not sound like
> a worthy goal),


I don't believe it is, in the general case. This is for those cases where
you might go only with reluctance with numpy, or even be forced to roll
your own. Numpy is a beast that some people, me included, haven't need to
learn yet statistics often come in use in a lot of algorithms. Not to
mention the full third-second lag to import numpy ;).


> I have to agree with Alexander w.r.t. "sum". Strongly
> -1 from me on having functions with the same name as existing stdlib
> functions but different functionality. This is very much unpythonic.
>

I don't agree that this is a segregation that has to happen, but I agree
that it's not something that stdlib does AFAIK. I think that's a tradition
worth keeping. Additionally it's not immediately obvious to any newcomer
why statistics.sum is implemented differently to builtins.sum - this should
be made evident from the name (akin to fsum).

statistics.sum is a statistical sum of numeric data optimised to be
correct. builtins.sum is, as far as the user can tell, just iterated
addition. They both have their place but they're different places and it
should be more immediately obvious where.

Finally -- do we need math.fsum¹ if we have statistics.sum?

¹ I just noticed fsum says "a float is required" when given invalid data
despite accepting generic numerics.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130804/22c6dd46/attachment-0001.html>


More information about the Python-ideas mailing list