Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
Perhaps it is a slightly semantical discussion; but all fp calculations have errors, and there are always strategies for making them smaller. We just don't happen to like the error for this case; but rest assured it won't be hard to find new cases of 'blatantly wrong' results, no matter what accumulator is implemented. That's no reason to not try and be clever about it, but there isn't going to be an algorithm that is best for all possible inputs, and in the end the most important thing is that the algorithm used is specified in the docs. -----Original Message----- From: "Alan G Isaac" <alan.isaac@gmail.com> Sent: 25-7-2014 00:10 To: "Discussion of Numerical Python" <numpy-discussion@scipy.org> Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays On 7/24/2014 4:42 PM, Eelco Hoogendoorn wrote:
This isn't a bug report, but rather a feature request.
I'm not sure statement this is correct. The mean of a float32 array can certainly be computed as a float32. Currently this is not necessarily what happens, not even approximately. That feels a lot like a bug, even if we can readily understand how the algorithm currently used produces it. To say whether it is a bug or not, don't we have to ask about the intent of `mean`? If the intent is to sum and divide, then it is not a bug. If the intent is to produce the mean, then it is a bug. Alan Isaac _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
To elaborate on that point; knowing that numpy accumulates in a simple first-to-last sweep, and does not implicitly upcast, the original problem can be solved in several ways; specifying a higher precision to sum with, or by a nested summation, like X.mean(0).mean(0)==1.0. I personally like this explicitness, and am wary of numpy doing overly clever things behind the scenes, as I can think of other code that might become broken if things change too radically. For instance, I often sort large arrays with a large spread in magnitudes before summation, relying on the fact that summing the smallest values first gives best precision. Any changes made to reduction behavior should try and be backwards compatible with such properties of straightforward reductions, or else a lot of code is going to be broken without warning. I suppose using maximum precision internally, and nesting all reductions over multiple axes of an ndarray, are both easy to implement improvements that do not come with any drawbacks that I can think of. Actually the maximum precision I am not so sure of, as I personally prefer to make an informed decision about precision used, and get an error on a platform that does not support the specified precision, rather than obtain subtly or horribly broken results without warning when moving your code to a different platform/compiler whatever. On Fri, Jul 25, 2014 at 5:37 AM, Eelco Hoogendoorn < hoogendoorn.eelco@gmail.com> wrote:
Perhaps it is a slightly semantical discussion; but all fp calculations have errors, and there are always strategies for making them smaller. We just don't happen to like the error for this case; but rest assured it won't be hard to find new cases of 'blatantly wrong' results, no matter what accumulator is implemented. That's no reason to not try and be clever about it, but there isn't going to be an algorithm that is best for all possible inputs, and in the end the most important thing is that the algorithm used is specified in the docs. ------------------------------ From: Alan G Isaac <alan.isaac@gmail.com> Sent: 25-7-2014 00:10
To: Discussion of Numerical Python <numpy-discussion@scipy.org> Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays
On 7/24/2014 4:42 PM, Eelco Hoogendoorn wrote:
This isn't a bug report, but rather a feature request.
I'm not sure statement this is correct. The mean of a float32 array can certainly be computed as a float32. Currently this is not necessarily what happens, not even approximately. That feels a lot like a bug, even if we can readily understand how the algorithm currently used produces it. To say whether it is a bug or not, don't we have to ask about the intent of `mean`? If the intent is to sum and divide, then it is not a bug. If the intent is to produce the mean, then it is a bug.
Alan Isaac _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (1)
-
Eelco Hoogendoorn