efficient running median
steven at REMOVE.THIS.cybersource.com.au
Wed Oct 14 00:03:49 CEST 2009
On Tue, 13 Oct 2009 12:59:47 -0700, Paul Rubin wrote:
> sturlamolden <sturlamolden at yahoo.no> writes:
>> > The obvious way to compute a running median involves a tree structure
>> > so you can quickly insert and delete elements, and find the median.
>> > That would be asymtotically O(n log n) but messy to implement.
>> QuickSelect will find the median in O(log n) time.
> That makes no sense, you can't even examine the input in O(log n) time.
If that were a relevant argument, no algorithms would ever be described
as running in O(log n).
Obviously to run in O(log n) you must have already built the tree. If you
include the time required to build the tree, then O(n) is the lower-bound
(because you have to see each element to put it in the tree) but that
applies to any data structure. The point is, once you have that tree, you
can perform repeated calculations in O(log n) time (or so it has been
For example, if you want the m running medians of m data points, you
might do the following:
find the median of element 1
find the median of element 1 and 2
find the median of element 1 through 3
find the median of element 1 through 4
find the median of element 1 through m
Each median calculation may take O(log n), where n varies from 1 to m,
giving a total order of:
log 1 + log 2 + log 3 + ... + log m
=> O(log m!)
steps, which is much better than:
1 + 2 + 3 + ... + m = 1/2*m*(m+1)
> Anyway, the problem isn't to find the median of n elements. It's to
> find n different medians of n different sets. You might be able to get
> close to linear time though, depending on how the data acts.
But since the data sets aren't independent, a tree structure seems like
the way to go to me. Inserting and deleting elements should be fast, and
assuming the claim of calculating the median in O(log n) is correct,
that's quite fast too.
More information about the Python-list