[Python-ideas] Fwd: stats module Was: minmax() function ...

Masklinn masklinn at masklinn.net
Fri Oct 15 22:56:46 CEST 2010


On 2010-10-15, at 22:01 , Raymond Hettinger wrote:
> Drat.  This should have gone to python-ideas.
> Re-sending.
> 
> Begin forwarded message:
> 
>> From: Raymond Hettinger <raymond.hettinger at gmail.com>
>> Date: October 15, 2010 1:00:16 PM PDT
>> To: Python-Dev Dev <python-dev at python.org>
>> Subject: Fwd: [Python-ideas] stats module Was: minmax() function ...
>> 
>> Hello guys.  If you don't mind, I would like to hijack your thread :-)
>> 
>> ISTM, that the minmax() idea is really just an optimization request.
>> A single-pass minmax() is easily coded in simple, pure-python,
>> so really the discussion is about how to remove the loop overhead
>> (there isn't much you can do about the cost of the two compares
>> which is where most of the time would be spent anyway).
>> 
>> My suggestion is to aim higher.   There is no reason a single pass
>> couldn't also return min/max/len/sum and perhaps even other summary
>> statistics like sum(x**2) so that you can compute standard deviation 
>> and variance.
>> 
>> A few years ago, Guido and other python devvers supported a
>> proposal I made to create a stats module, but I didn't have time
>> to develop it.  The basic idea was that python's batteries should
>> include most of the functionality available on advanced student
>> calculators.  Another idea behind it was that we could invisibility
>> do-the-right-thing under the hood to help users avoid numerical
>> problems (i.e. math.fsum(s)/len(s) is a more accurate way to
>> compute an average because it doesn't lose precision when
>> building-up the intermediate sums).
>> 
>> I think the creativity and energy of this group is much better directed
>> at building a quality stats module (perhaps with some R-like capabilities).
>> That would likely be a better use of energy than bike-shedding 
>> about ways to speed-up a trivial piece of code that is ultimately
>> constrained by the cost of the compares per item.
>> 
>> my-two-cents,
>> 
>> 
>> Raymond

I think I'd still go with composable coroutines, the kind of stuff dabeaz shows/promotes in his training sessions and stuff. Maybe with a higher-level interface making their usage easier, but they seem a perfect fit for that kind of stuff where you create arbitrary data pipes including forks and joins.

As others mentioned, generator-based coroutines in Python have to be primed (by calling next() once on them) which is kind-of a pain, but the decorator to "fix" that is easy enough to write.


More information about the Python-ideas mailing list