[Python-ideas] Pre-PEP: adding a statistics module to Python

Sun Aug 4 15:53:37 CEST 2013

On Sun, Aug 4, 2013 at 6:41 AM, Clay Sweetser <clay.sweetser at gmail.com> wrote:
>
> On Aug 4, 2013 8:53 AM, "Eli Bendersky" <eliben at gmail.com> wrote:
>>
>> On Sun, Aug 4, 2013 at 12:07 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
>> > On 08/03/2013 07:00 PM, Eli Bendersky wrote:
>> >>
>> >>
>> >> While I'm somewhat -0.5 on the general idea of the statistics module
>> >> (competing with well-established, super-optimized and
>> >> by-themselves-famous numeric libraries Python has does not sound like
>> >> a worthy goal),
>> >
>> >
>> > Sure, competing with already established libraries is silly.
>> > Fortunately,
>> > that's not what is happening here.  This PEP is about providing a
>> > minimal,
>> > common set of statistics functions for the average person.
>>
>> I'm really not sure who this average person is, but everyone keeps
>> talking about him. Is it the same person for whom Dummies books are
>> written?
>>
>> Anyhow, "minimal" is a dangerous slope. With such a module in the
>> stdlib, I'm 100% sure we'll get a constant stream of - please add just
>> this function (from SciPy) - it's so useful to the "average person" -
>> requests. This is unavoidable. And it will be difficult to judge at
>> that point why certain funcitonality belongs or does not belong here.
>> So over time we'll end up with a partial Greenspun, by containing an
>> ad hoc, slow implementation of half of Numpy/SciPy.
>>
>> Efforts are better spent in writing a new tutorial on Numpy that shows
>> how to do the stuff statistics.py does. Call it "Numpy statistics for
>> the average person".
> By this same logic, had common modules such as math not already been
> proposed, any proposal to add them now would be rejected. Why have the math
> module, when numpy is available? Why have asyncore (Ill designed as some may
> call it) or any of the port and connection libraries, when twisted and
> tornado are available? Would you want them removed when Python 4000 comes
> along?

Comparison with existing, historical code that pre-dated most of the
3rd party libs out there is irrelevant, of course. Had the stdlib been
designed today, I'm sure it would look differently, and yet this is
not the situation we're in.

> If a good statistics module, with a well defined scope, is created, then I
> believe there will be minimal requests for additions.

On what is this belief based? Years of observing this mailing list?
Once you have foo and bar in "statistics", every discussion will end
up justifying why they are better than "baz" that was left out.

> For those requests that do come along, one only has to look at the mail
> archives to see how often a proposal for addition of something into a
> standard library module succeeds to know that it is unlikely that a
> statistics module will "accumulate" features.

Right, so it's better to nip it at the bud. There's a good reason the
stdlib does not grow new features every second friday. It's because
there is a group of people who has to stick with it for years
maintaining all that code. It's perfectly OK to look critically at all
new proposals. Having one way to do it is a Python design goal.

Eli