Robin Becker robin at reportlab.com
Wed Oct 27 20:21:30 CEST 2004

```Frohnhofer, James wrote:
>>Robin Becker wrote
>>...... interestingly the standard algorithms for means/deviations are
>>numerically poor. As an example
>>
>> >>> (1.e30+1-1.0e30)/3
>>0.0
>>
>>Where we have total loss of information in one of the terms.
>>
>>
>>There are better algorithms involving recursion,
>>
>>eg
>>
>>    mu[n] = (data[n]+(n-1)*mu[n-1])/n
>>The error in the existing estimate, mu[n-1], gets multiplied by the
>>number (n-1)/n which is less than one (ie the errors are damped).
>>
>
>
> I'm not sure I believe this.  It's been a while since I studied this, but I
> think I remember maintaining precision at the Frankenstein level:
> to a minimum, and try to do them at the end.
>
>
>>However, in the case of widely differing magnitudes I think
>>you need to
>>accumulate the sums specially.
>
>
>
> Well, its not even that hard:
>
>  >>> data=[1.e30,1.,-1.e30]
>  >>> sum(data)/len(data)
>  0.0
>  >>> data.sort(lambda x,y:-cmp(abs(x),abs(y)))
>  >>> data
>  [-1e+030, 1e+030, 1.0]
>  >>> sum(data)/len(data)  #Assuming sum works the way I think it should . . .
>  0.33333333333333331
>
> But using the recursive
>  >>> def mu(lst):
>          n = len(lst)
>          if n==1:
>              return lst[0]
>          else:
>              return (lst[0] + ((n-1)*mu(lst[1:])))/n
>
>  >>> data=[1.e30,1.,-1.e30]
>  >>> mu(data)
>  0.0
>  >>> data.sort(lambda x,y:-cmp(abs(x),abs(y)))
>  >>> mu(data)
>  0.0
>
> it seems we lose even the ability to maintain the precision by sorting on the
> magnitude.

not so, the sort needs to be reverse absolute for the simplistic analysis to work
and even then I believe it may not suffice.

eg
>>> def mu(data):
... 	estimate = 0
... 	for i,v in enumerate(data):
... 		estimate = (v+i*estimate)/float(i+1)
... 	return estimate
...
>>> data=[1.e30,1.,-1.e30]
>>> data.sort(lambda x,y: -cmp(abs(x),abs(y)))
>>> data
[1e+030, -1e+030, 1.0]
>>> mu(data)
0.33333333333333331

Of course we now have an O(nLog(n)) average rather than O(n) so the numericists