On Wed, May 1, 2013 at 1:24 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
HI,
On Wed, May 1, 2013 at 9:09 AM, Yaroslav Halchenko <lists@onerussian.com> wrote:
On Wed, 01 May 2013, Nathaniel Smith wrote:
not sure there is anything to fix here. Third-party code relying on a certain outcome of rounding error is likely incorrect anyway.
Yeah, seems to just be the standard floating point indeterminism. Using Matthew's numbers and pure Python floats:
In [9]: (0.49505185 + 0.53529587) + -0.13461665 Out[9]: 0.89573107
In [10]: 0.49505185 + (0.53529587 + -0.13461665) Out[10]: 0.8957310700000001
In [11]: _9 - _10 Out[11]: -1.1102230246251565e-16
Looks like a bug in pymvpa or its test suite to me.
well -- sure thing we will "fix" the unittest to not rely on precise correspondence any longer since released 1.7.1 is effected. So it is not a matter of me avoiding "fixing" pymvpa's "bug".
I brought it to your attention because
1. from e.g.
np.sum(data[:, 0]) - np.sum(data, axis=0)[0]
which presumably should be the same order of additions for 0-th column it is not clear that they do not have to be identical.
I agree it's surprising, but I guess it's reasonable for numpy to reserve the right to add these guys up in whatever order it chooses, and (in this case) maybe a different order for the axis=None, axis=X cases.
Also, y'all may have noticed that it is the presence of the second vector in the array which causes the difference in the sums of the first (see my first email in this thread). If this is an order effect I guess this means that the order of operations in an sum(a, axis=X) operation depends on the shape of the array. And it looks like it depends on memory layout:
In [24]: data = np.array([[ 0.49505185, 0], ....: [ 0.53529587, 0], ....: [-0.13461665, 0]])
In [25]: np.sum(data[:, 0]) - np.sum(data, axis=0)[0] Out[25]: 1.1102230246251565e-16
In [26]: data_F = np.array(data, order='F')
In [27]: np.sum(data_F[:, 0]) - np.sum(data_F, axis=0)[0] Out[27]: 0.0
Do we allow the results to be different for different memory layout?
Wasn't this the point of some of Mark Wiebe's optimization? As far as I understand he got rid of some of the C bias, and made calculations faster for Fortran contiguous arrays. I rather have speed for my Fortran arrays, then relying on float precision issues, that bite anyway when running on many different kinds of machines. (I usually have to lower some unit test precision during Debian testing of statsmodels.) Josef
2. so far they were identical across many numpy releases
3. they are identical on other architectures (e.g. amd64)
To me that is surprising. I would have guessed that the order is the same on 32 and 64 bit, but something about the precision of intermediate operations is different. I don't know enough about amd64 to guess what that could be. Bradley's suggestion seems kind of reasonable but it's strange that numpy should use intel-80 bit intermediate values differently for 32 and 64 bit.
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion