Mailman 3 Re: [Numpy-discussion] numpy.mean still broken for large float32arrays - NumPy-Discussion

24 Jul 2014

      Thanks Julian, those seem like Nice improvements. The fact that it either does or doesnt work depending on the axis makes me a Little queesy; but yeah, checking that fp's do what You think they should, is unfortunately best left as the responsibility of the programmer.

-----Original Message-----
From: "Julian Taylor" 
Sent: ‎24-‎7-‎2014 14:56
To: "Discussion of Numerical Python" 
Subject: Re: [Numpy-discussion] numpy.mean still broken for large float32arrays

On Thu, Jul 24, 2014 at 1:33 PM, Fabien  wrote:
...
Hi all,
On 24.07.2014 11:59, Eelco Hoogendoorn wrote:
...
np.mean isn't broken; your understanding of floating point number is.
I am quite new to python, and this problem is discussed over and over
for other languages too. However, numpy's summation problem appears with
relatively small arrays already:
py>import numpy as np
py>np.ones((4000,4000), np.float32).mean()
1.0
py>np.ones((5000,5000), np.float32).mean()
0.67108864000000001
A 5000*5000 image is not unusual anymore today.
In IDL:
IDL> mean(fltarr(5000L, 5000L)+1)
        1.0000000
IDL> mean(fltarr(7000L, 7000L)+1)
        1.0000000
IDL> mean(fltarr(10000L, 10000L)+1)
       0.67108864
I can't really explain why there are differences between the two
languages (IDL uses 32-bit, single-precision, floating-point numbers)
Fabien
something as simple as summation is already an interesting algorithmic
problem there are several ways do to with different speeds and
accuracies. E.g. pythons math.fsum is always exact to one ulp but is
very slow as it requires partial sorting. Then there is kahan
summation that has an accuracy of O(1) ulp but its about 4 times
slower than the naive sum.
In practice one of the better methods is pairwise summation that is
pretty much as fast as a naive summation but has an accuracy of
O(logN) ulp.
This is the method numpy 1.9 will use this method by default (+ its
even a bit faster than our old implementation of the naive sum):
https://github.com/numpy/numpy/pull/3685

but it has some limitations, it is limited to blocks fo the buffer
size (8192 elements by default) and does not work along the slow axes
due to limitations in the numpy iterator.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.mean still broken for large float32arrays

Eelco Hoogendoorn

tags

participants (1)