Note, the following is partly to document my explorations in computing lat/on grids in numpy lately, in case others come across these issues in the future. It is partly a plea for some development of numerically accurate functions for computing lat/lon grids from a combination of inputs: bounds, counts, and resolutions. I have been playing around with the decimal package a bit lately, and I discovered the concept of "fused multiply-add" operations for improved accuracy. I have come to realize that fma operations could be used to greatly improve the accuracy of linspace() and arange(). In particular, I have been needing improved results for computing latitude/longitude grids, which tend to be done in float32's to save memory (at least, this is true in data I come across). Since fma is not available yet in python, please consider the following C snippet: ``` $ cat test_fma.c #include<stdlib.h> #include<math.h> #include<stdio.h> int main(){ float res = 0.01; float x0 = -115.0; int cnt = 7001; float x1 = res * cnt + x0; float x2 = fma(res, cnt, x0); printf("x1 %.16f x2 %.16f\n", x1, x2); float err1 = fabs(x1 - -44.990000000000000); float err2 = fabs(x2 - -44.990000000000000); printf("err1 %.5e err2 %.5e\n", err1, err2); return 0; } $ gcc test_fma.c -lm -o test_fma $ ./test_fma x1 -44.9899978637695312 x2 -44.9900016784667969 err1 2.13623e-06 err2 1.67847e-06 ``` And if you do similar for double-precision, the fma still yields significant accuracy over double precision explicit multiply-add. Now, to the crux of my problem. It is next to impossible to generate a non-trivial numpy array of coordinates, even in double precision, without hitting significant numerical errors. Which has lead me down the path of using the decimal package (which doesn't play very nicely with numpy because of the lack of casting rules for it). Consider the following: ``` $ cat test_fma.py from __future__ import print_function import numpy as np res = np.float32(0.01) cnt = 7001 x0 = np.float32(-115.0) x1 = res * cnt + x0 print("res * cnt + x0 = %.16f" % x1) x = np.arange(-115.0, -44.99 + (res / 2), 0.01, dtype='float32') print("len(arange()): %d arange()[-1]: %16f" % (len(x), x[-1])) x = np.linspace(-115.0, -44.99, cnt, dtype='float32') print("linspace()[-1]: %.16f" % x[-1]) $ python test_fma.py res * cnt + x0 = -44.9900015648454428 len(arange()): 7002 arange()[-1]: -44.975044 linspace()[-1]: -44.9900016784667969 ``` arange just produces silly results (puts out an extra element... adding half of the resolution is typically mentioned as a solution on mailing lists to get around arange()'s limitations -- I personally don't do this). linspace() is a tad bit better than arange(), but still a little bit worse than just computing the value straight out. It also produces a result that is on par with fma(), so that is reassuring that it is about as accurate as can be at the moment. This also matches up almost exactly to what I would get if I used Decimal()s and then casted to float32's for final storage, so that's not too bad as well. So, does it make any sense to improve arange by utilizing fma() under the hood? Also, any plans for making fma() available as a ufunc? Notice that most of my examples required knowing the number of grid points ahead of time. But what if I didn't know that? What if I just have the bounds and the resolution? Then arange() is the natural fit, but as I showed, its accuracy is lacking, and you have to do some sort of hack to do a closed interval. Thoughts? Comments? Ben Root (donning flame suit)