[Numpy-discussion] improving arange()? introducing fma()?

Chris Barker chris.barker at noaa.gov
Fri Feb 9 12:19:40 EST 2018


On Wed, Feb 7, 2018 at 12:09 AM, Ralf Gommers <ralf.gommers at gmail.com>
wrote:
>
>  It is partly a plea for some development of numerically accurate
>> functions for computing lat/lon grids from a combination of inputs: bounds,
>> counts, and resolutions.
>>
>
Can you be more specific about what problems you've run into -- I work with
lat-lon grids all the time, and have never had a problem.

float32 degrees gives you about 1 meter accuracy or better, so I can see
how losing a few digits might be an issue, though I would argue that you
maybe shouldn't use float32 if you are worried about anything close to 1m
accuracy... -- or shift to a relative coordinate system of some sort.

I have been playing around with the decimal package a bit lately,
>>
>
sigh. decimal is so often looked at a solution to a problem it isn't
designed for. lat-lon is natively Sexagesimal -- maybe we need that dtype
:-)

what you get from decimal is variable precision -- maybe a binary variable
precision lib is a better answer -- that would be a good thing to have easy
access to in numpy, but in this case, if you want better accuracy in a
computation that will end up in float32, just use float64.

and I discovered the concept of "fused multiply-add" operations for
>> improved accuracy. I have come to realize that fma operations could be used
>> to greatly improve the accuracy of linspace() and arange().
>>
>
arange() is problematic for non-integer use anyway, by its very definition
(getting the "end point" correct requires the right step, even without FP
error).

and would it really help with linspace? it's computing a delta with one
division in fp, then multiplying it by an integer (represented in fp --
why? why not keep that an integer till the multiply?).

In particular, I have been needing improved results for computing
>> latitude/longitude grids, which tend to be done in float32's to save memory
>> (at least, this is true in data I come across).
>>
>
> If you care about saving memory *and* accuracy, wouldn't it make more
> sense to do your computations in float64, and convert to float32 at the
> end?
>

that does seem to be the easy option :-)


> Now, to the crux of my problem. It is next to impossible to generate a
>> non-trivial numpy array of coordinates, even in double precision, without
>> hitting significant numerical errors.
>>
>
I'm confused, the example you posted doesn't have significant errors...


> Which has lead me down the path of using the decimal package (which
>> doesn't play very nicely with numpy because of the lack of casting rules
>> for it). Consider the following:
>> ```
>> $ cat test_fma.py
>> from __future__ import print_function
>> import numpy as np
>> res = np.float32(0.01)
>> cnt = 7001
>> x0 = np.float32(-115.0)
>> x1 = res * cnt + x0
>> print("res * cnt + x0 = %.16f" % x1)
>> x = np.arange(-115.0, -44.99 + (res / 2), 0.01, dtype='float32')
>> print("len(arange()): %d  arange()[-1]: %16f" % (len(x), x[-1]))
>> x = np.linspace(-115.0, -44.99, cnt, dtype='float32')
>> print("linspace()[-1]: %.16f" % x[-1])
>>
>> $ python test_fma.py
>> res * cnt + x0 = -44.9900015648454428
>> len(arange()): 7002  arange()[-1]:       -44.975044
>> linspace()[-1]: -44.9900016784667969
>> ```
>> arange just produces silly results (puts out an extra element... adding
>> half of the resolution is typically mentioned as a solution on mailing
>> lists to get around arange()'s limitations -- I personally don't do this).
>>
>
The real solution is "don't do that" arange is not the right tool for the
job.

Then there is this:

res * cnt + x0 = -44.9900015648454428
linspace()[-1]: -44.9900016784667969

that's as good as you are ever going to get with 32 bit floats...

Though I just noticed something about your numbers -- there should be a
nice even base ten delta if you have 7001 gaps -- but linspace produces N
points, not N gaps -- so maybe you want:


In [*17*]: l = np.linspace(-115.0, -44.99, 7002)


In [*18*]: l[:5]

Out[*18*]: array([-115.  , -114.99, -114.98, -114.97, -114.96])


In [*19*]: l[-5:]

Out[*19*]: array([-45.03, -45.02, -45.01, -45.  , -44.99])


or, in float32 -- not as pretty:


In [*20*]: l = np.linspace(-115.0, -44.99, 7002, dtype=np.float32)


In [*21*]: l[:5]

Out[*21*]:

array([-115.        , -114.98999786, -114.98000336, -114.97000122,

       -114.95999908], dtype=float32)


In [*22*]: l[-5:]

Out[*22*]: array([-45.02999878, -45.02000046, -45.00999832, -45.        ,
-44.99000168], dtype=float32)


but still as good as you get with float32, and exactly the same result as
computing in float64 and converting:



In [*25*]: l = np.linspace(-115.0, -44.99, 7002).astype(np.float32)


In [*26*]: l[:5]

Out[*26*]:

array([-115.        , -114.98999786, -114.98000336, -114.97000122,

       -114.95999908], dtype=float32)


In [*27*]: l[-5:]

Out[*27*]: array([-45.02999878, -45.02000046, -45.00999832, -45.        ,
-44.99000168], dtype=float32)



>> So, does it make any sense to improve arange by utilizing fma() under the
>> hood?
>>
>
no -- this is simply not the right use-case for arange() anyway.


> Also, any plans for making fma() available as a ufunc?
>>
>
could be nice -- especially if used internally.


> Notice that most of my examples required knowing the number of grid points
>> ahead of time. But what if I didn't know that? What if I just have the
>> bounds and the resolution? Then arange() is the natural fit, but as I
>> showed, its accuracy is lacking, and you have to do some sort of hack to do
>> a closed interval.
>>
>
no -- it's not -- if you have the bounds and the resolution, you have an
over-specified problem. That is:

x_min + (n * delta_x) == x_max

If there is ANY error in either delta_x or x_max (or x_min), then you'll
get a missmatch. which is why arange is not the answer (you can make the
algorithm a bit more accurate, I suppose but there is still fp limited
precision -- if you can't exactly represent either delta_x or x_max, then
you CAN'T use the arange() definition and expect to work consistently.

The "right" way to do it is to compute N with: round((x_max - x_min) /
delta), and then use linspace:

linspace(x_min, x_max, N+1)

(note that it's too bad you need to do N+1 -- if I had to do it over again,
I'd use N as the number of "gaps" rather than the number of points --
that's more commonly what people want, if they care at all)

This way, you get a grid with the endpoints as exact as they can be, and
the deltas as close to each-other as they can be as well.

maybe you can do a better algorithm in linspace to save an ULP, but it's
hard to imagine when that would matter.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180209/6a6bc04f/attachment-0001.html>


More information about the NumPy-Discussion mailing list