[Python-ideas] Pre-PEP 2nd draft: adding a statistics module to Python

Thu Aug 8 18:36:40 CEST 2013

On 08/08/2013 16:48, Oscar Benjamin wrote:
> On 8 August 2013 15:28, Steven D'Aprano <steve at pearwood.info> wrote:
>> Attached is the second draft of the pre-PEP for adding a statistics module
>> to Python. A brief summary of the most important changes:
>
> It all looks good to me.
>
> About this part in the PEP though:
> '''
> Open and Deferred Issues
>
>      - At this stage, I am unsure of the best API for multivariate statistical
>        functions such as linear regression, correlation coefficient, and
>        covariance. Possible APIs include:
>
>          * Separate arguments for x and y data:
>            function([x0, x1, ...], [y0, y1, ...])
>
>          * A single argument for (x, y) data:
>            function([(x0, y0), (x1, y1), ...])
>
>          * Selecting arbitrary columns from a 2D array:
>            function([[a0, x0, y0, z0], [a1, x1, y1, z1], ...], x=1, y=2)
>
>          * Some combination of the above.
>
>        In the absence of a consensus of preferred API for multivariate stats,
>        I will defer including such multivariate functions until Python 3.5.
> '''
>
I tend to prefer the second form. If your data is the form of a pair of
lists, then it's easy enough zip them anyway.

> I don't think there's been any discussion about this so there's no
> lack of consensus. Or would you just prefer to defer it for now?
>
> I'm just going to say that it basically doesn't matter which of the
> first two options you go for; the third one with the 2D array and
> indices is an unnecessary complication.
>
+1. It isn't trying to be numpy.