On Wed, Feb 7, 2018 at 2:57 AM, Benjamin Root wrote:
Note, the following is partly to document my explorations in computing lat/on grids in numpy lately, in case others come across these issues in the future. It is partly a plea for some development of numerically accurate functions for computing lat/lon grids from a combination of inputs: bounds, counts, and resolutions.

You're talking about accuracy several times, it would help to define what your actual requirements are. And a quantification of what the accuracy loss currently is. I wouldn't expect it to me more than a few ulp for float64.

I have been playing around with the decimal package a bit lately, and I discovered the concept of "fused multiply-add" operations for improved accuracy. I have come to realize that fma operations could be used to greatly improve the accuracy of linspace() and arange(). In particular, I have been needing improved results for computing latitude/longitude grids, which tend to be done in float32's to save memory (at least, this is true in data I come across).

If you care about saving memory *and* accuracy, wouldn't it make more sense to do your computations in float64, and convert to float32 at the end? That must be much more accurate than any new float32 algorithm you come up with.

Ralf

Since fma is not available yet in python, please consider the following C snippet:

```
\$ cat test_fma.c
#include<stdlib.h>
#include<math.h>
#include<stdio.h>

int main(){
float res = 0.01;
float x0 = -115.0;
int cnt = 7001;
float x1 = res * cnt + x0;
float x2 = fma(res, cnt, x0);
printf("x1 %.16f  x2 %.16f\n", x1, x2);
float err1 = fabs(x1 - -44.990000000000000);
float err2 = fabs(x2 - -44.990000000000000);
printf("err1 %.5e  err2 %.5e\n", err1, err2);
return 0;
}
\$ gcc test_fma.c -lm -o test_fma
\$ ./test_fma
x1 -44.9899978637695312  x2 -44.9900016784667969
err1 2.13623e-06  err2 1.67847e-06
```

And if you do similar for double-precision, the fma still yields significant accuracy over double precision explicit multiply-add.

Now, to the crux of my problem. It is next to impossible to generate a non-trivial numpy array of coordinates, even in double precision, without hitting significant numerical errors. Which has lead me down the path of using the decimal package (which doesn't play very nicely with numpy because of the lack of casting rules for it). Consider the following:
```
\$ cat test_fma.py
from __future__ import print_function
import numpy as np
res = np.float32(0.01)
cnt = 7001
x0 = np.float32(-115.0)
x1 = res * cnt + x0
print("res * cnt + x0 = %.16f" % x1)
x = np.arange(-115.0, -44.99 + (res / 2), 0.01, dtype='float32')
print("len(arange()): %d  arange()[-1]: %16f" % (len(x), x[-1]))
x = np.linspace(-115.0, -44.99, cnt, dtype='float32')
print("linspace()[-1]: %.16f" % x[-1])

\$ python test_fma.py
res * cnt + x0 = -44.9900015648454428
len(arange()): 7002  arange()[-1]:       -44.975044
linspace()[-1]: -44.9900016784667969
```
arange just produces silly results (puts out an extra element... adding half of the resolution is typically mentioned as a solution on mailing lists to get around arange()'s limitations -- I personally don't do this).

linspace() is a tad bit better than arange(), but still a little bit worse than just computing the value straight out. It also produces a result that is on par with fma(), so that is reassuring that it is about as accurate as can be at the moment. This also matches up almost exactly to what I would get if I used Decimal()s and then casted to float32's for final storage, so that's not too bad as well.

So, does it make any sense to improve arange by utilizing fma() under the hood? Also, any plans for making fma() available as a ufunc?

Notice that most of my examples required knowing the number of grid points ahead of time. But what if I didn't know that? What if I just have the bounds and the resolution? Then arange() is the natural fit, but as I showed, its accuracy is lacking, and you have to do some sort of hack to do a closed interval.