[Python-ideas] [Python-Dev] Fwd: stats module Was: minmax() function ...

sunqiang sunqiang at gmail.com
Sat Oct 16 02:49:15 CEST 2010


On Sat, Oct 16, 2010 at 8:05 AM, geremy condra <debatem1 at gmail.com> wrote:
> On Fri, Oct 15, 2010 at 1:00 PM, Raymond Hettinger
> <raymond.hettinger at gmail.com> wrote:
>> Hello guys.  If you don't mind, I would like to hijack your thread :-)
>>
>> ISTM, that the minmax() idea is really just an optimization request.
>> A single-pass minmax() is easily coded in simple, pure-python,
>> so really the discussion is about how to remove the loop overhead
>> (there isn't much you can do about the cost of the two compares
>> which is where most of the time would be spent anyway).
>>
>> My suggestion is to aim higher.   There is no reason a single pass
>> couldn't also return min/max/len/sum and perhaps even other summary
>> statistics like sum(x**2) so that you can compute standard deviation
>> and variance.
>
> +1 from me. Here's a normal cdf and chi squared cdf approximation I
> use for randomness testing. They may need to refined for inclusion,
> but you're welcome to use them if you'd like.
>
> from math import sqrt, erf
>
> def normal_cdf(x, mu=0, sigma=1):
>        """Approximates the normal cumulative distribution"""
>        return (1/2) * (1 + erf((x+mu)/(sigma*sqrt(2))))
>
> def chi_squared_cdf(x, k):
>        """Approximates the cumulative chi-squared statistic with k degrees
> of freedom."""
>        numerator = 1 - (2/(9*k)) - ((x/k)**(1/3))
>        denominator = (1/3) * sqrt(2/k)
>        return normal_cdf(numerator/denominator)
>
>> A few years ago, Guido and other python devvers supported a
>> proposal I made to create a stats module, but I didn't have time
>> to develop it.  The basic idea was that python's batteries should
>> include most of the functionality available on advanced student
>> calculators.  Another idea behind it was that we could invisibility
>> do-the-right-thing under the hood to help users avoid numerical
>> problems (i.e. math.fsum(s)/len(s) is a more accurate way to
>> compute an average because it doesn't lose precision when
>> building-up the intermediate sums).
>
> Can you give some other examples? Sage does some of this and I
> frequently find it annoying, actually, but I'm not sure if you're
> referring to the same things there.
have seen a blog post[1]  several months ago from reddit[2], maybe it
worth a reading.
[1]: http://www.johndcook.com/blog/2010/06/07/math-library-functions-that-seem-unnecessary/
[2]: http://www.reddit.com/r/programming/comments/ccbja/math_library_functions_that_seem_unnecessary/

> Geremy Condra
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/sunqiang%40gmail.com
>



More information about the Python-ideas mailing list