On Sun, Sep 08, 2013 at 10:25:22AM -0700, Guido van Rossum wrote:
Steven, I'd like to just approve the PEP, given the amount of discussion that's happened already (though I didn't follow much of it). I quickly glanced through the PEP and didn't find anything I'd personally object to, but then I found your section of open issues, and I realized that you don't actually specify the proposed API in the PEP itself. It's highly unusual to approve a PEP that doesn't contain a specification. What did I miss?
You didn't miss anything, but I may have.
Should the PEP go through each public function in the module (there are only 11)? That may be a little repetitive, since most have the same, or almost the same, signatures. Or is it acceptable to just include an overview? I've come up with this:
The initial version of the library will provide univariate (single variable) statistics functions. The general API will be based on a functional model ``function(data, ...) -> result``, where ``data`` is a mandatory iterable of (usually) numeric data.
The author expects that lists will be the most common data type used, but any iterable type should be acceptable. Where necessary, functions may convert to lists internally. Where possible, functions are expected to conserve the type of the data values, for example, the mean of a list of Decimals should be a Decimal rather than float.
Calculating the mean, median and mode
The ``mean``, ``median`` and ``mode`` functions take a single mandatory argument and return the appropriate statistic, e.g.:
>>> mean([1, 2, 3]) 2.0
``mode`` is the sole exception to the rule that the data argument must be numeric. It will also accept an iterable of nominal data, such as strings.
Calculating variance and standard deviation
In order to be similar to scientific calculators, the statistics module will include separate functions for population and sample variance and standard deviation. All four functions have similar signatures, with a single mandatory argument, an iterable of numeric data, e.g.:
>>> variance([1, 2, 2, 2, 3]) 0.5
All four functions also accept a second, optional, argument, the mean of the data. This is modelled on a similar API provided by the GNU Scientific Library. There are three use-cases for using this argument, in no particular order:
1) The value of the mean is known *a priori*.
2) You have already calculated the mean, and wish to avoid calculating it again.
3) You wish to (ab)use the variance functions to calculate the second moment about some given point other than the mean.
In each case, it is the caller's responsibility to ensure that given argument is meaningful.
Is this satisfactory or do I need to go into more detail?