Note, the following is partly to document my explorations in computing
lat/on grids in numpy lately, in case others come across these issues in
the future. It is partly a plea for some development of numerically
accurate functions for computing lat/lon grids from a combination of
inputs: bounds, counts, and resolutions.
I have been playing around with the decimal package a bit lately, and I
discovered the concept of "fused multiply-add" operations for improved
accuracy. I have come to realize that fma operations could be used to
greatly improve the accuracy of linspace() and arange(). In particular, I
have been needing improved results for computing latitude/longitude grids,
which tend to be done in float32's to save memory (at least, this is true
in data I come across).
Since fma is not available yet in python, please consider the following C
snippet:
```
$ cat test_fma.c
#include<stdlib.h>
#include<math.h>
#include<stdio.h>
int main(){
float res = 0.01;
float x0 = -115.0;
int cnt = 7001;
float x1 = res * cnt + x0;
float x2 = fma(res, cnt, x0);
printf("x1 %.16f x2 %.16f\n", x1, x2);
float err1 = fabs(x1 - -44.990000000000000);
float err2 = fabs(x2 - -44.990000000000000);
printf("err1 %.5e err2 %.5e\n", err1, err2);
return 0;
}
$ gcc test_fma.c -lm -o test_fma
$ ./test_fma
x1 -44.9899978637695312 x2 -44.9900016784667969
err1 2.13623e-06 err2 1.67847e-06
```
And if you do similar for double-precision, the fma still yields
significant accuracy over double precision explicit multiply-add.
Now, to the crux of my problem. It is next to impossible to generate a
non-trivial numpy array of coordinates, even in double precision, without
hitting significant numerical errors. Which has lead me down the path of
using the decimal package (which doesn't play very nicely with numpy
because of the lack of casting rules for it). Consider the following:
```
$ cat test_fma.py
from __future__ import print_function
import numpy as np
res = np.float32(0.01)
cnt = 7001
x0 = np.float32(-115.0)
x1 = res * cnt + x0
print("res * cnt + x0 = %.16f" % x1)
x = np.arange(-115.0, -44.99 + (res / 2), 0.01, dtype='float32')
print("len(arange()): %d arange()[-1]: %16f" % (len(x), x[-1]))
x = np.linspace(-115.0, -44.99, cnt, dtype='float32')
print("linspace()[-1]: %.16f" % x[-1])
$ python test_fma.py
res * cnt + x0 = -44.9900015648454428
len(arange()): 7002 arange()[-1]: -44.975044
linspace()[-1]: -44.9900016784667969
```
arange just produces silly results (puts out an extra element... adding
half of the resolution is typically mentioned as a solution on mailing
lists to get around arange()'s limitations -- I personally don't do this).
linspace() is a tad bit better than arange(), but still a little bit worse
than just computing the value straight out. It also produces a result that
is on par with fma(), so that is reassuring that it is about as accurate as
can be at the moment. This also matches up almost exactly to what I would
get if I used Decimal()s and then casted to float32's for final storage, so
that's not too bad as well.
So, does it make any sense to improve arange by utilizing fma() under the
hood? Also, any plans for making fma() available as a ufunc?
Notice that most of my examples required knowing the number of grid points
ahead of time. But what if I didn't know that? What if I just have the
bounds and the resolution? Then arange() is the natural fit, but as I
showed, its accuracy is lacking, and you have to do some sort of hack to do
a closed interval.
Thoughts? Comments?
Ben Root
(donning flame suit)