[Python-ideas] Expanding statistical functions in Python's std. lib.

Mike Graham mikegraham at gmail.com
Thu Sep 1 22:56:20 CEST 2011

On Tue, Aug 30, 2011 at 11:46 AM, Spectral One <ghostwriter402 at gmail.com> wrote:
> Wandering about, looking up statistics info for a program I was writing, I
> found a recommendation to add various useful 'special functions' to C's math
> library:
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1069.pdf
> The arguments in that paper make a lot of sense to me, and apply well to
> Python. They came up with a good list, IMnsHO. I'd recommend implementing
> this list in some form as library functions in Python.
> Blindly copying wouldn't end up particularly 'Pythonic;' tweaking the API is
> required.    Some of the selection choices, such as returning real only,
> ought to be reevaluated, for example. Obviously, any of the decisions to
> keep things C-like rather than object-oriented ought to shift, as well.
> Function names are only important as far as they are clear. I suggest naming
> per <general category><specific case> e.g. distribution_t(), or dist_F(),
> and include modification for algebraic order, as well, so gamma() and
> log_gamma(). That said, anything clear is fine.
> Thoughts on the matter? I noticed that the math library in 2.7+ added the
> gamma and log(gamma) functions, already, which was nice. Obviously, most, if
> not all, are already present in extensions modules such as NumPy, but there
> is value in having these things built into the language. "Batteries
> included, "and all that.
> By the by, if that is far too much for one suggestion, then please just
> treat this as a suggestion to add just the incomplete beta function.
> (P-values for binomial, F, and t are all nice, too, though with inc. beta,
> they aren't terrible to generate. I really think they should be included in
> the standard library.)

I'm not sure that many people who could make tons of use from
statistical functions don't already have cause to be using
numpy/scipy. I would certainly be unfortunate if having a little more
statistics functionality in the stdlib discouraged people who should
be using numpy from doing so.

"Batteries included" has always been a bit of an oversell, and as a
Python user I don't have any expectation of being able to do
fairly-specialized work without third-party modules, nor do I think
it's necessarily a net gain for Python if I could.



More information about the Python-ideas mailing list