On Mon, Jan 19, 2009 at 7:34 PM, Hans Meine meine@informatik.uni-hamburg.de wrote:

On Friday 19 December 2008 03:27:12 Bradford Cross wrote:

This is a new project I just released.

I know it is C#, but some of the design and idioms would be nice in numpy/scipy for working with discrete event simulators, time series, and event stream processing.

Hi, do you know about the boost accumulators project?

It's still in boost's sandbox, but I love its design, and it provides a large number of well-documented, mathematically sound estimators for variance, mean, etc.: http://boost-sandbox.sourceforge.net/libs/accumulators/doc/html/index.html

Just a heads-up, in case someone finds this useful here. (Don't know about people's fondness of boost and/or C++ here.)

Not a boost/C++ fan, but I like those projects. Incremental statistics have several advantages (outside the obvious one to get an online estimate when the data arrive sequentially): they can be much more memory friendly in a python context (for example, if you want to compute statistics for billion of samples, you could do in mini batches, and an incremental framework can help here), and they can often converge faster than an offline version if you have all the data.

I am not yet clear how pervasive those techniques are - I have looked at several papers which prove the convergence of several well known algorithms, and implemented some of them (in particular online EM algorithm for online estimation of mixtures of Gaussian, with Bayesian variations for sequential model comparison), and I would have expected them to be more well known. I may just not be that familiar with the concerned fields, though.

cheers,

David