[Python-Dev] Accumulation module

Raymond Hettinger python at rcn.com
Wed Jan 14 03:24:41 EST 2004


> > * What to call the module

[Aahz]
> stats

There is already a stat module.  Any chance of confusion?

The other naming issue is that some of the functions have
non-statistical uses:  product() is general purpose; nlargest() and
nsmallest() will accept any datatype (though most of the use cases are
with numbers).  Are there other general purpose (non-statistical)
accumulation/reduction formulas that go here?


> > * What else should be in it?

[Matthias Klose]
> you may want to have a look at
> http://www.nmr.mgh.harvard.edu/Neural_Systems_Group/gary/python.html


Ages ago, when the idea for this module first arose, a certain bot
recommended strongly against including any but the most basic
statistical functions (muttering something about the near impossibility
of doing it well in either python or portable C and something about not
wanting to maintain anything that wasn't dirt simple).  His words would
have of course fallen on deaf ears, but a certain dictatorial type had
just finished teaching advanced programming skills to people who
couldn't operate a high school calculator.  Sooooo, no Kurtosis for you,
no gamma function for me!

It's possible that chi-square or regression could slip in, but it would
require considerable cheerleading and a rare planetary alignment.


> > * What else should be in it?

[Jeremy]
> median()

> And a function like bins() or histogram() that accumulates 
> the values in buckets of some size.

That sounds beginner simple and reasonably useful though it would have
been nice if all the reduction formulas could work with one-pass and
never need to manifest the whole dataset in memory.

> >  Note, heapq is used for both (I use
> > operator.neg to swap between largest and smallest).

[Bernhard Herzog] 
> Does that mean nlargest/nsmallest only work for numbers?  I think it
> might be useful for e.g. strings too.

The plan was to make them work with anything defining __lt__; however,
if it is coded in python and uses heapq, I don't see a straight-forward
way around using operator.neg without wrapping everything in some sense
reverser object.



Raymond Hettinger




More information about the Python-Dev mailing list