efficient running median
denis
denis-bz-gg at t-online.de
Mon Oct 19 10:38:39 EDT 2009
Folks,
approximate medians -- would you settle for 49 - 51 % ? --
open up new possibilities, and there's quite a lot of work on that,
for huuuge datasets.
A class of problems:
from a data stream X1 X2 ... you want, every so often,
a histogram / quantile summary / distribution estimator such that
H approximates the distribution of X, so
median of H(t) ~ median of some of X.
Some parameters of this class:
NH, size of H: 100 => H(p) ~ pth percentile of X
A, how often you want H
window, drop or age old data
runtime, memory of course
accuracy -- in theory, in practice
A histogram viewer in which you can change buckets, colors, zoom,
in REALTIME sounds tantalizing -- fast loop >> fast algorithm + slow
loop.
Anyone know of one off-the -shelf, python or anything else ?
Refs:
Manku et al., Approximate medians in one pass with limited memory,
1998, 10p
under http://infolab.stanford.edu/~manku/papers.html
nice tree pictures
they optimize mem (NH * Nbuf) not runtime, and don't window
Suri +, Quantiles on Streams, 2006, 5p, http://www.cs.ucsb.edu/~suri/psdir/ency.pdf
~ 20 refs zzz
cheers
-- denis
More information about the Python-list
mailing list