Raymond, You asked:
* Is there a way to compute the standard deviation without multiple passes over the data (one to compute the mean and a second to sum the squares of the differences from the mean?
I do not understand the problem in getting stdev in a single pass. Perhaps I not understood your problem. If you have a series of values you merely have to calculate their sum and sum-of-squares and use the usual formula. Ignore this if this is dumb. -- Prof G A Vignaux Mathematical and Computing Sciences, Victoria University, PO Box 600, Work: +64 4 463 5276 Wellington, NZ Home: +64 4 934 7851 Tony.Vignaux.Remove_This_Bit@mcs.vuw.ac.nz Mobile: +64 21 89 7851 http://www.mcs.vuw.ac.nz/~vignaux
On Wednesday 2004-02-25 19:23, Tony Vignaux wrote:
Raymond,
You asked:
* Is there a way to compute the standard deviation without multiple passes over the data (one to compute the mean and a second to sum the squares of the differences from the mean?
I do not understand the problem in getting stdev in a single pass. Perhaps I not understood your problem. If you have a series of values you merely have to calculate their sum and sum-of-squares and use the usual formula.
The trouble with that is that it's numerically horrible. There are better ways, some of which were discussed in python-dev a few weeks ago. -- g
Tony Vignaux wrote:
Raymond,
You asked:
* Is there a way to compute the standard deviation without multiple passes over the data (one to compute the mean and a second to sum the squares of the differences from the mean?
I do not understand the problem in getting stdev in a single pass. Perhaps I not understood your problem. If you have a series of values you merely have to calculate their sum and sum-of-squares and use the usual formula.
True, if you can use exact numbers. When calculating with floats (also named real, because they are everything else but real), you unfortunately calculate two things at the same time: Your intended calculation, plus some propagating error. The trick is to keep the error propagation as small as possible, while keeping the algorithm readable and effective. To my knowledge, always adding the smallest values first is best possible for minimizing cutoff/rounding, although it involes sorting. Using pairwise addition of values might be a good compromize, but I neither know state of the art nor did I follw the thread.
Ignore this if this is dumb.
reals are dumb. ciao - chris -- Christian Tismer :^) mailto:tismer@stackless.com Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/
Christian Tismer wrote:
To my knowledge, always adding the smallest values first is best possible for minimizing cutoff/rounding, although it involes sorting.
Oops, no, it is better to add those which are nearest! I should have read this earlier: http://www.owlnet.rice.edu/~caam420/lectures/fpsum.html ciao - chris p.s.: Note that products behave much better. Summation is really against the nature of float. -- Christian Tismer :^) mailto:tismer@stackless.com Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/
* Is there a way to compute the standard deviation without multiple passes over the data (one to compute the mean and a second to sum the squares of the differences from the mean?
I do not understand the problem in getting stdev in a single pass. Perhaps I not understood your problem. If you have a series of values you merely have to calculate their sum and sum-of-squares and use the usual formula.
The sandbox contains the latest and greatest version which computes the standard deviation in a single pass using a recurrence formula that has much better numerical performance than the textbook approach. Raymond Hettinger
participants (4)
-
Christian Tismer
-
Gareth McCaughan
-
Raymond Hettinger
-
Tony Vignaux