[Python-Dev] Accumulation module

Raymond Hettinger python at rcn.com
Wed Jan 14 07:40:43 EST 2004


> > * Is there a way to compute the standard deviation without multiple
> > passes over the data (one to compute the mean and a second to sum
the
> > squares of the differences from the mean?

[Tim]
> The textbook formula does it in one pass, by accumulating running
totals
> of
> the elements and of their squares.  It's numerically atrocious,
though,
> and
> should never be used (it can even end up delivering a negative
result!).
> Knuth gives a much better recurrence formula for this, doing one-pass
> computation of variance in a numerically stable way.

I found it.

There is a suggestion that the formula would be better off
if the recurrence sums are accumulated along with a cumulative
error term that tracks the loss of significance when adding
a small adjustment to a larger sum.  The textbook exercise 
indicates that that technique is the most useful in the presence
of crummy rounding.  Do you think there is any value in using
the error term tracking?

def stddev(data):
    # Knuth -- Seminumerical Algorithms 4.2.2 p.216
    it = iter(data)
    m = float(it.next())    # new running mean
    s = 0.0                 # running sum((x-mean)**2)
    k = 1                   # number of items
    dm = ds = 0             # carried forward error term
    for x in it:
        k += 1

        adjm = (x-m)/k - dm
        newm = m + adjm
        dm = (newm - m) - adjm
        
        adjs = (x-m)*(x-newm) - ds
        news = s + adjs
        ds = (news - s) - adjs
        s = news

        m = newm
    return (s / (k-1)) ** 0.5  # sample standard deviation

Without the error term tracking, the inner loop simplifies to:

        k += 1
        newm = m+(x-m)/k
        s += (x-m)*(x-newm)
        m = newm


Raymond Hettinger




More information about the Python-Dev mailing list