[Numpy-discussion] improving arange()? introducing fma()?

Ralf Gommers ralf.gommers at gmail.com
Wed Feb 7 01:09:45 EST 2018


On Wed, Feb 7, 2018 at 2:57 AM, Benjamin Root <ben.v.root at gmail.com> wrote:

> Note, the following is partly to document my explorations in computing
> lat/on grids in numpy lately, in case others come across these issues in
> the future. It is partly a plea for some development of numerically
> accurate functions for computing lat/lon grids from a combination of
> inputs: bounds, counts, and resolutions.
>

You're talking about accuracy several times, it would help to define what
your actual requirements are. And a quantification of what the accuracy
loss currently is. I wouldn't expect it to me more than a few ulp for
float64.


>
> I have been playing around with the decimal package a bit lately, and I
> discovered the concept of "fused multiply-add" operations for improved
> accuracy. I have come to realize that fma operations could be used to
> greatly improve the accuracy of linspace() and arange(). In particular, I
> have been needing improved results for computing latitude/longitude grids,
> which tend to be done in float32's to save memory (at least, this is true
> in data I come across).
>

If you care about saving memory *and* accuracy, wouldn't it make more sense
to do your computations in float64, and convert to float32 at the end? That
must be much more accurate than any new float32 algorithm you come up with.

Ralf



>
> Since fma is not available yet in python, please consider the following C
> snippet:
>
> ```
> $ cat test_fma.c
> #include<stdlib.h>
> #include<math.h>
> #include<stdio.h>
>
> int main(){
>     float res = 0.01;
>     float x0 = -115.0;
>     int cnt = 7001;
>     float x1 = res * cnt + x0;
>     float x2 = fma(res, cnt, x0);
>     printf("x1 %.16f  x2 %.16f\n", x1, x2);
>     float err1 = fabs(x1 - -44.990000000000000);
>     float err2 = fabs(x2 - -44.990000000000000);
>     printf("err1 %.5e  err2 %.5e\n", err1, err2);
>     return 0;
> }
> $ gcc test_fma.c -lm -o test_fma
> $ ./test_fma
> x1 -44.9899978637695312  x2 -44.9900016784667969
> err1 2.13623e-06  err2 1.67847e-06
> ```
>
> And if you do similar for double-precision, the fma still yields
> significant accuracy over double precision explicit multiply-add.
>
> Now, to the crux of my problem. It is next to impossible to generate a
> non-trivial numpy array of coordinates, even in double precision, without
> hitting significant numerical errors. Which has lead me down the path of
> using the decimal package (which doesn't play very nicely with numpy
> because of the lack of casting rules for it). Consider the following:
> ```
> $ cat test_fma.py
> from __future__ import print_function
> import numpy as np
> res = np.float32(0.01)
> cnt = 7001
> x0 = np.float32(-115.0)
> x1 = res * cnt + x0
> print("res * cnt + x0 = %.16f" % x1)
> x = np.arange(-115.0, -44.99 + (res / 2), 0.01, dtype='float32')
> print("len(arange()): %d  arange()[-1]: %16f" % (len(x), x[-1]))
> x = np.linspace(-115.0, -44.99, cnt, dtype='float32')
> print("linspace()[-1]: %.16f" % x[-1])
>
> $ python test_fma.py
> res * cnt + x0 = -44.9900015648454428
> len(arange()): 7002  arange()[-1]:       -44.975044
> linspace()[-1]: -44.9900016784667969
> ```
> arange just produces silly results (puts out an extra element... adding
> half of the resolution is typically mentioned as a solution on mailing
> lists to get around arange()'s limitations -- I personally don't do this).
>
> linspace() is a tad bit better than arange(), but still a little bit worse
> than just computing the value straight out. It also produces a result that
> is on par with fma(), so that is reassuring that it is about as accurate as
> can be at the moment. This also matches up almost exactly to what I would
> get if I used Decimal()s and then casted to float32's for final storage, so
> that's not too bad as well.
>
> So, does it make any sense to improve arange by utilizing fma() under the
> hood? Also, any plans for making fma() available as a ufunc?
>
> Notice that most of my examples required knowing the number of grid points
> ahead of time. But what if I didn't know that? What if I just have the
> bounds and the resolution? Then arange() is the natural fit, but as I
> showed, its accuracy is lacking, and you have to do some sort of hack to do
> a closed interval.
>
> Thoughts? Comments?
> Ben Root
> (donning flame suit)
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180207/84791d65/attachment-0001.html>


More information about the NumPy-Discussion mailing list