[Python-ideas] Pre-PEP: adding a statistics module to Python
oscar.j.benjamin at gmail.com
Tue Aug 6 14:58:02 CEST 2013
On 6 August 2013 10:02, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Oscar Benjamin writes:
> > >> It's also not common AFAIK in other statistical packages
> > >> (at least not under the name mode).
> > >
> > > Press et al claim it is poorly known, but much better than the
> > > binning method. It saddens me that twenty years on, it's still
> > > poorly known.
> In what sense is it "better" than the binning method?
The book that Steven is referencing is written (primarily) for the
benefit of scientists. I think it is expected that the if you're
trying to estimate the mode of a continuously distributed quantity
then it is because, say, you have experimental data from a skewed
distribution. I'm not sure though as I've just borrowed a 1999 edition
(in C) from a colleague's desk and this particular method/algorithm
isn't included (it doesn't give any method to compute the mode).
> If you're
> working with tax data or subsidy data, your bins will be given to you
> (the brackets).
That's a good point. It would be useful if a mode function could use
the appropriate bins where they are predetermined. Of course you can
bin them yourself and call modes(). Scipy/Matlab etc. provide the
bin-counting functionality separately under hist or histogram rather
> Similarly for geographical data (political
> boundaries), and so on.
It's definitely your job to bin those!
> I've almost never found choice of bins to be
> a problem (but my use cases are such that either the bins are given or
> they don't much matter because there's enough data to approximate a
> density graphically).
> Does it properly identify multiple modes (preferably including lower
> peaks), or does it involve a single-peakedness assumption?
It doesn't assume single-peakedness. There are a couple of strategies
for identifying possible additional modes after finding the first (see
> > My preference really is just that modes() returns a list of all
> > modes and the user should decide what to do with however many
> > values they get back.
> I might be useful to have helper functions or methods to make common
Perhaps modes() could return all modes and mode() could return 1 if
there's exactly 1 or otherwise raise an error.
More information about the Python-ideas