efficient running median

denis denis-bz-gg at t-online.de
Mon Oct 19 10:38:39 EDT 2009


Folks,
  approximate medians -- would you settle for 49 - 51 %  ? --
open up new possibilities, and there's quite a lot of work on that,
for huuuge datasets.

A class of problems:
from a data stream X1 X2 ... you want, every so often,
a histogram / quantile summary / distribution estimator such that
    H approximates the distribution of X, so
    median of H(t) ~ median of some of X.
Some parameters of this class:
    NH, size of H: 100 => H(p) ~ pth percentile of X
    A, how often you want H
    window, drop or age old data
    runtime, memory of course
    accuracy -- in theory, in practice

A histogram viewer in which you can change buckets, colors, zoom,
in REALTIME sounds tantalizing -- fast loop >> fast algorithm + slow
loop.
Anyone know of one off-the -shelf, python or anything else ?

Refs:
Manku et al., Approximate medians in one pass with limited memory,
1998, 10p
    under http://infolab.stanford.edu/~manku/papers.html
    nice tree pictures
    they optimize mem (NH * Nbuf) not runtime, and don't window

Suri +, Quantiles on Streams, 2006, 5p, http://www.cs.ucsb.edu/~suri/psdir/ency.pdf
    ~ 20 refs zzz

cheers
 -- denis



More information about the Python-list mailing list