[Python-ideas] Pre-PEP: adding a statistics module to Python

Oscar Benjamin oscar.j.benjamin at gmail.com
Mon Aug 5 17:58:04 CEST 2013


On 2 August 2013 18:45, Steven D'Aprano <steve at pearwood.info> wrote:
> I have raised an issue on the tracker to add a statistics module to Python's
> standard library:
>
> http://bugs.python.org/issue18606
>
> and have been asked to write a PEP. Attached is my draft PEP. Feedback is
> requested, thanks in advance.

Having looked at the reference implementation I'm slightly confused
about the mode API/implementation. It took a little while for me to
understand what the ``window`` parameter is for (I was only able to
understand it by studying the source) but I've got it now.

ISTM that the mode class is splicing two fundamentally different
things together:
1) Finding the most frequently occurring values in some collection of data.
2) Estimating the location of the peak of a hypothetical continuous
probability distribution from which some real-valued numeric data is
drawn.

The 2) part does not seem like something that is normally in secondary
school maths. It's also not common AFAIK in other statistical packages
(at least not under the name mode). If scipy has this then it has a
different name because scipy.stats.mode just does case 1). The same
goes for MATLAB's mode function, MS Excel, LibreOffice and basically
anything else I can remember using.

Also the API for invoking case 2) which is conceptually a completely
different thing is to call
    mode(data, window=3)
which seems very cryptic given the significant conceptual and
algorithmic differences that are invoked as a result. (What's wrong
with using window=2 anyway?)

I would suggest that mode should be split into two separate entities
for these two different operations. But, then really I don't expect
many people to use the 2) part and it doesn't really come under the
"minimal" specification described in the PEP. So instead I think that
it should just be removed to simplify the documentation and
implementation of mode for the common case.


Oscar


More information about the Python-ideas mailing list