[Python-Dev] Accumulation module
Raymond Hettinger
python at rcn.com
Wed Jan 14 07:40:43 EST 2004
> > * Is there a way to compute the standard deviation without multiple
> > passes over the data (one to compute the mean and a second to sum
the
> > squares of the differences from the mean?
[Tim]
> The textbook formula does it in one pass, by accumulating running
totals
> of
> the elements and of their squares. It's numerically atrocious,
though,
> and
> should never be used (it can even end up delivering a negative
result!).
> Knuth gives a much better recurrence formula for this, doing one-pass
> computation of variance in a numerically stable way.
I found it.
There is a suggestion that the formula would be better off
if the recurrence sums are accumulated along with a cumulative
error term that tracks the loss of significance when adding
a small adjustment to a larger sum. The textbook exercise
indicates that that technique is the most useful in the presence
of crummy rounding. Do you think there is any value in using
the error term tracking?
def stddev(data):
# Knuth -- Seminumerical Algorithms 4.2.2 p.216
it = iter(data)
m = float(it.next()) # new running mean
s = 0.0 # running sum((x-mean)**2)
k = 1 # number of items
dm = ds = 0 # carried forward error term
for x in it:
k += 1
adjm = (x-m)/k - dm
newm = m + adjm
dm = (newm - m) - adjm
adjs = (x-m)*(x-newm) - ds
news = s + adjs
ds = (news - s) - adjs
s = news
m = newm
return (s / (k-1)) ** 0.5 # sample standard deviation
Without the error term tracking, the inner loop simplifies to:
k += 1
newm = m+(x-m)/k
s += (x-m)*(x-newm)
m = newm
Raymond Hettinger
More information about the Python-Dev
mailing list