[Python-Dev] PEP 450 adding statistics module

Steven D'Aprano steve at pearwood.info
Mon Sep 9 14:07:58 CEST 2013

On Mon, Sep 09, 2013 at 05:44:43AM -0500, Skip Montanaro wrote:
> > However, it's common in economic statistics to have a rectangular
> > array, and extract both certain rows (tuples of observations on
> > variables) and certain columns (variables).  For example you might
> > have data on populations of American states from 1900 to 2012, and
> > extract the data on New England states from 1946 to 2012 for analysis.
> When Steven first brought up this PEP on comp.lang.python, my main concern
> was basically, "we have SciPy, why do we need this?" Steven's response, which
> I have come to accept, is that there are uses for basic statistics for
> which SciPy's
> stats module would be overkill.
> However, once you start slicing your data structure along more than one axis, I
> think you very quickly will find that you need numpy arrays for performance
> reasons, at which point you might as go "all the way" and install SciPy. I don't
> think slicing along multiple dimensions should be a significant concern for this
> package.

I agree. I'm not interested in trying to compete with numpy in areas 
where numpy is best. That's a fight any pure-Python module is going to 
lose :-)

> Alternatively, I thought there was discussion a long time ago about
> getting numpy's
> (or even further back, numeric's?) array type into the core. Python
> has an array type
> which I don't think gets a lot of use (or love). Might it be
> worthwhile to make sure the
> PEP 450 package works with that? Then extend it to multiple dimensions? Or just
> bite the bullet and get numpy's array type into the Python core once
> and for all?

I haven't tested PEP 450 statistics with numpy array, but any sequence 
type ought to work. While I haven't done extensive testing on the 
array.array type, basic testing shows that it works as expected:

py> import array
py> import statistics
py> data = array.array('f', range(1, 101))
py> statistics.mean(data)
py> statistics.variance(data)


More information about the Python-Dev mailing list