Re: [Numpy-discussion] please change mean to use dtype=float

22 Sep 2006

      ...
David M. Cooke wrote:
...
On Thu, 21 Sep 2006 11:34:42 -0700
Tim Hochberg  wrote:
...
Tim Hochberg wrote:
...
Robert Kern wrote:
...
David M. Cooke wrote:
...
On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
> Let me offer a third path: the algorithms used for .mean() and .var()
> are substandard. There are much better incremental algorithms that
> entirely avoid the need to accumulate such large (and therefore
> precision-losing) intermediate values. The algorithms look like the
> following for 1D arrays in Python:
>
> def mean(a):
>      m = a[0]
>      for i in range(1, len(a)):
>          m += (a[i] - m) / (i + 1)
>      return m
>       
>         
>             
>               
This isn't really going to be any better than using a simple sum.
It'll also be slower (a division per iteration).
With one exception, every test that I've thrown at it shows that it's
better for float32. That exception is uniformly spaced arrays, like
linspace().
...
You do avoid
accumulating large sums, but then doing the division a[i]/len(a) and
adding that will do the same.
Okay, this is true.
...
Now, if you want to avoid losing precision, you want to use a better
summation technique, like compensated (or Kahan) summation:
def mean(a):
    s = e = a.dtype.type(0)
    for i in range(0, len(a)):
        temp = s
        y = a[i] + e
        s = temp + y
        e = (temp - s) + y
    return s / len(a)
...
> def var(a):
>      m = a[0]
>      t = a.dtype.type(0)
>      for i in range(1, len(a)):
>          q = a[i] - m
>          r = q / (i+1)
>          m += r
>          t += i * q * r
>      t /= len(a)
>      return t
>
> Alternatively, from Knuth:
>
> def var_knuth(a):
>      m = a.dtype.type(0)
>      variance = a.dtype.type(0)
>      for i in range(len(a)):
>          delta = a[i] - m
>          m += delta / (i+1)
>          variance += delta * (a[i] - m)
>      variance /= len(a)
>      return variance
>             
>               
I'm going to go ahead and attach a module containing the versions of 
mean, var, etc that I've been playing with in case someone wants to mess 
with them. Some were stolen from traffic on this list, for others I 
grabbed the algorithms from wikipedia or equivalent.
I looked into this a bit more. I checked float32 (single precision) and
float64 (double precision), using long doubles (float96) for the "exact"
results. This is based on your code. Results are compared using
abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values in the
range of [-100, 900]
First, the mean. In float32, the Kahan summation in single precision is
better by about 2 orders of magnitude than simple summation. However,
accumulating the sum in double precision is better by about 9 orders of
magnitude than simple summation (7 orders more than Kahan).
In float64, Kahan summation is the way to go, by 2 orders of magnitude.
For the variance, in float32, Knuth's method is *no better* than the two-pass
method. Tim's code does an implicit conversion of intermediate results to
float64, which is why he saw a much better result.
Doh! And I fixed that same problem in the mean implementation earlier 
too. I was astounded by how good knuth was doing, but not astounded 
enough apparently.
Does it seem weird to anyone else that in:
    numpy_scalar <op> python_scalar
the precision ends up being controlled by the python scalar? I would 
expect the numpy_scalar to control the resulting precision just like 
numpy arrays do in similar circumstances. Perhaps the egg on my face is 
just clouding my vision though.
...
The two-pass method using
Kahan summation (again, in single precision), is better by about 2 orders of
magnitude. There is practically no difference when using a double-precision
accumulator amongst the techniques: they're all about 9 orders of magnitude
better than single-precision two-pass.
In float64, Kahan summation is again better than the rest, by about 2 orders
of magnitude.
I've put my adaptation of Tim's code, and box-and-whisker plots of the
results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
Conclusions:
- If you're going to calculate everything in single precision, use Kahan
summation. Using it in double-precision also helps.
- If you can use a double-precision accumulator, it's much better than any of
the techniques in single-precision only.
- for speed+precision in the variance, either use Kahan summation in single
precision with the two-pass method, or use double precision with simple
summation with the two-pass method. Knuth buys you nothing, except slower
code :-)
The two pass methods are definitely more accurate. I won't be convinced 
on the speed front till I see comparable C implementations slug it out. 
That may well mean never in practice. However, I expect that somewhere 
around 10,000 items, the cache will overflow and memory bandwidth will 
become the bottleneck. At that point the extra operations of Knuth won't 
matter as much as making two passes through the array and Knuth will win 
on speed. 
OK, let me take this back. I looked at this speed effect a little more 
and the effect is smaller than I remember. For example, "a+=a" slows 
down by about a factor of 2.5 on my box between 10,000 and 100,0000 
elements. That's not insignificant, but assuming (naively) that this 
means that memory effects account to the equivalent of 1.5 add 
operations, this doesn't come close to closing to gap between Knuth and
Tim Hochberg wrote:
the two pass approach. The other thing is that while a+=a and similar 
operations show this behaviour, a.sum() and add.reduce(a) do not, 
hinting that perhaps it's only writing to memory that is a bottleneck. 
Or perhaps hinting that my mental model of what's going on here is badly 
flawed.

So, +1 for me on Kahan summation for computing means and two pass with 
Kahan summation for variances. It would probably be nice to expose the 
Kahan sum and maybe even the raw_kahan_sum somewhere. I can see use for 
the latter in summing a bunch of disparate matrices with high precision.

I'm on the fence on using the array dtype for the accumulator dtype 
versus always using at least double precision for the accumulator. The 
former is easier to explain and is probably faster, but the latter is a 
lot more accuracy for basically free. It depends on how the relative 
speeds shake out I suppose. If the speed of using float64 is comparable 
to that of using float32, we might as well. One thing I'm not on the 
fence about is the return type: it should always match the input type, 
or dtype if that is specified. Otherwise one gets mysterious, gratuitous 
upcasting.

On the subject of upcasting, I'm still skeptical of the behavior of 
mixed numpy-scalar and python-scalar ops. Since numpy-scalars are 
generally the results of indexing operations, not literals, I think that 
they should behave like arrays for purposes of determining the resulting 
precision, not like python-scalars. Otherwise situations arise where the 
rank of the input array determine the kind of output (float32 versus 
float64 for example) since if the array is 1D indexing into the array 
yields numpy-scalars which then get promoted when they interact with 
python-scalars.However if the array is 2D, indexing yields a vector, 
which is not promoted when interacting with python scalars and the 
precision remains fixed.

-tim
...
Of course the accuracy is pretty bad at single precision, so 
the possible, theoretical speed advantage at large sizes probably 
doesn't matter.
-tim
...
After 1.0 is out, we should look at doing one of the above.
+1
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion
m   bnb