On Fri, Dec 19, 2008 at 2:53 PM, John Hunter <jdh2358@gmail.com> wrote:

On Thu, Dec 18, 2008 at 8:27 PM, Bradford Cross
<bradford.n.cross@gmail.com> wrote:
> This is a new project I just released.
>
> I know it is C#, but some of the design and idioms would be nice in
> numpy/scipy for working with discrete event simulators, time series, and
> event stream processing.
>
> http://code.google.com/p/incremental-statistics/

I think an incremental stats module would be a boon to numpy or scipy.
Eric Firing has a nice module wrtten in C with a pyrex wrapper
(ringbuf)

Please excuse my ignorance - what is the performance overhead of calling C via the pyrex wrapper? A lot of use cases for incremental statistics are discrete event systems where the calculations will be updated millions or billions of times; this was a concern I had about doing the project in C and calling across a wrapper. Maybe it was one of those entirely speculative and unfounded concerns. :-)

that does trailing incremental mean, median, std, min, max,
and percentile. It maintains a sorted queue to do the last three
efficiently, and handles NaN inputs.

Not sure if our results hold universally or even asymptoticly, but we found that our implimention of order/rank statistics was faster when we backed it with partition selection algorithms operating on an array-based queue as opposed to our implimentaion of a sorted dequeue backed by a circular buffer.

How does it handle NaN inputs exactly - does it just guard against them? That is the approach we took as well. We have a calculation guard that filters for both NaN and infinite values.

I would like to see this
extended to include exponential or other weightings to do things like
incremental trailing exponential moving averages and variances.

This is a cool idea that I hadn't thought of. We do have exponentially weighted mean, but ideally one could supply a weighting function to any statistic. We've been moving toward a more functional combinator style library design lately and this is anothr step in that direction.

I
don't know what the licensing terms are of this module, but it might
be a good starting point for an incremental numpy stats module, at
least if you were thinking about supporting a finite lookback window.

Yes, it sound great! If you read the docs here: http://code.google.com/p/incremental-statistics/ you can see that are have taken care to build the library from the beginning for static, accumulating, and rolling cases. The rolling case is what you are refering to as a finite lookback window, whereas accumualting as an accumulating lookback window, and the static case is the typical "compute hte mean of the entire sieries of observations at once" case. IMO, it turns out really nice when you think this way from the begnning becasue you get a lot of code reuse and nice oppertunities for composition.

We have a copy of this in the py4science examples dir if you want to
take a look:

svn co https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/py4science/examples/pyrex/trailstats
cd trailstats/
make
python movavg_ringbuf.py

Other things that would be very useful are incremental covariance and
regression.

Indeed. We have a bit on the dependence statistics side, but not much. Incremental dependence and regression are the two hot items on the backlog. :-)

JDH

_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion