On Mon, Sep 09, 2013 at 05:44:43AM -0500, Skip Montanaro wrote:
However, it's common in economic statistics to have a rectangular array, and extract both certain rows (tuples of observations on variables) and certain columns (variables). For example you might have data on populations of American states from 1900 to 2012, and extract the data on New England states from 1946 to 2012 for analysis.
When Steven first brought up this PEP on comp.lang.python, my main concern was basically, "we have SciPy, why do we need this?" Steven's response, which I have come to accept, is that there are uses for basic statistics for which SciPy's stats module would be overkill.
However, once you start slicing your data structure along more than one axis, I think you very quickly will find that you need numpy arrays for performance reasons, at which point you might as go "all the way" and install SciPy. I don't think slicing along multiple dimensions should be a significant concern for this package.
I agree. I'm not interested in trying to compete with numpy in areas where numpy is best. That's a fight any pure-Python module is going to lose :-)
Alternatively, I thought there was discussion a long time ago about getting numpy's (or even further back, numeric's?) array type into the core. Python has an array type which I don't think gets a lot of use (or love). Might it be worthwhile to make sure the PEP 450 package works with that? Then extend it to multiple dimensions? Or just bite the bullet and get numpy's array type into the Python core once and for all?
I haven't tested PEP 450 statistics with numpy array, but any sequence type ought to work. While I haven't done extensive testing on the array.array type, basic testing shows that it works as expected:
py> import array py> import statistics py> data = array.array('f', range(1, 101)) py> statistics.mean(data) 50.5 py> statistics.variance(data) 841.6666666666666