Mailman 3 improving arange()? introducing fma()? - NumPy-Discussion

7 Feb 2018

Also, right there is a good example of where the use of fma() could be of
value.

Cheers!
Ben Root

On Fri, Feb 9, 2018 at 4:56 PM, Eric Wieser <wieser.eric+numpy@gmail.com>
wrote:

> Can’t arange and linspace operations with floats be done internally
>
> Yes, and they probably should be - they’re done this way as a hack because
> the api exposed for custom dtypes is here
> <https://github.com/numpy/numpy/blob/81e15e812574d956fcc304c3982e2b59aa18aafb/numpy/core/include/numpy/ndarraytypes.h#L507-L511>,
> (example implementation here
> <https://github.com/numpy/numpy/blob/81e15e812574d956fcc304c3982e2b59aa18aafb/numpy/core/src/multiarray/arraytypes.c.src#L3711-L3721>)
> - essentially, you give it the first two elements of the array, and ask it
> to fill in the rest.
> 
>
> On Fri, 9 Feb 2018 at 13:17 Matthew Harrigan <harrigan.matthew@gmail.com>
> wrote:
>
>> I apologize if I'm missing something basic, but why are floats being
>> accumulated in the first place?  Can't arange and linspace operations with
>> floats be done internally similar to `start + np.arange(num_steps) *
>> step_size`?  I.e. always accumulate (really increment) integers to limit
>> errors.
>>
>> On Fri, Feb 9, 2018 at 3:43 PM, Benjamin Root <ben.v.root@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Feb 9, 2018 at 12:19 PM, Chris Barker <chris.barker@noaa.gov>
>>> wrote:
>>>
>>>> On Wed, Feb 7, 2018 at 12:09 AM, Ralf Gommers <ralf.gommers@gmail.com>
>>>> wrote:
>>>>>
>>>>>  It is partly a plea for some development of numerically accurate
>>>>>> functions for computing lat/lon grids from a combination of inputs: bounds,
>>>>>> counts, and resolutions.
>>>>>>
>>>>>
>>>> Can you be more specific about what problems you've run into -- I work
>>>> with lat-lon grids all the time, and have never had a problem.
>>>>
>>>> float32 degrees gives you about 1 meter accuracy or better, so I can
>>>> see how losing a few digits might be an issue, though I would argue that
>>>> you maybe shouldn't use float32 if you are worried about anything close to
>>>> 1m accuracy... -- or shift to a relative coordinate system of some sort.
>>>>
>>>
>>> The issue isn't so much the accuracy of the coordinates themselves. I am
>>> only worried about 1km resolution (which is approximately 0.01 degrees at
>>> mid-latitudes). My concern is with consistent *construction* of a
>>> coordinate grid with even spacing. As it stands right now. If I provide a
>>> corner coordinate, a resolution, and the number of pixels, the result is
>>> not terrible (indeed, this is the approach used by gdal/rasterio). If I
>>> have start/end coordinates and the number of pixels, the result is not bad,
>>> either (use linspace). But, if I have start/end coordinates and a
>>> resolution, then determining the number of pixels from that is actually
>>> tricky to get right in the general case, especially with float32 and large
>>> grids, and especially if the bounding box specified isn't exactly divisible
>>> by the resolution.
>>>
>>>
>>>>
>>>> I have been playing around with the decimal package a bit lately,
>>>>>>
>>>>>
>>>> sigh. decimal is so often looked at a solution to a problem it isn't
>>>> designed for. lat-lon is natively Sexagesimal -- maybe we need that dtype
>>>> :-)
>>>>
>>>> what you get from decimal is variable precision -- maybe a binary
>>>> variable precision lib is a better answer -- that would be a good thing to
>>>> have easy access to in numpy, but in this case, if you want better accuracy
>>>> in a computation that will end up in float32, just use float64.
>>>>
>>>
>>> I am not concerned about computing distances or anything like that, I am
>>> trying to properly construct my grid. I need consistent results regardless
>>> of which way the grid is specified (start/end/count, start/res/count,
>>> start/end/res). I have found that loading up the grid specs (using in a
>>> config file or command-line) using the Decimal class allows me to exactly
>>> and consistently represent the grid specification, and gets me most of the
>>> way there. But the problems with arange() is frustrating, and I have to
>>> have extra logic to go around that and over to linspace() instead.
>>>
>>>
>>>>
>>>> and I discovered the concept of "fused multiply-add" operations for
>>>>>> improved accuracy. I have come to realize that fma operations could be used
>>>>>> to greatly improve the accuracy of linspace() and arange().
>>>>>>
>>>>>
>>>> arange() is problematic for non-integer use anyway, by its very
>>>> definition (getting the "end point" correct requires the right step, even
>>>> without FP error).
>>>>
>>>> and would it really help with linspace? it's computing a delta with one
>>>> division in fp, then multiplying it by an integer (represented in fp --
>>>> why? why not keep that an integer till the multiply?).
>>>>
>>>
>>> Sorry, that was a left-over from a previous draft of my email after I
>>> discovered that linspace's accuracy was on par with fma(). And while
>>> arange() has inherent problems, it can still be made better than it is now.
>>> In fact, I haven't investigated this, but I did recently discover some unit
>>> tests of mine started to fail after a numpy upgrade, and traced it back to
>>> a reduction in the accuracy of a usage of arange() with float32s. So,
>>> something got worse at some point, which means we could still get accuracy
>>> back if we can figure out what changed.
>>>
>>>
>>>>
>>>> In particular, I have been needing improved results for computing
>>>>>> latitude/longitude grids, which tend to be done in float32's to save memory
>>>>>> (at least, this is true in data I come across).
>>>>>>
>>>>>
>>>>> If you care about saving memory *and* accuracy, wouldn't it make more
>>>>> sense to do your computations in float64, and convert to float32 at the
>>>>> end?
>>>>>
>>>>
>>>> that does seem to be the easy option :-)
>>>>
>>>
>>> Kinda missing the point, isn't it? Isn't that like saying "convert all
>>> your data to float64s prior to calling np.mean()"? That's ridiculous.
>>> Instead, we made np.mean() upcast the inner-loop operation, and even allow
>>> an option to specify what the dtype that should be used for the aggregator.
>>>
>>>
>>>>
>>>>
>>>>> Now, to the crux of my problem. It is next to impossible to generate a
>>>>>> non-trivial numpy array of coordinates, even in double precision, without
>>>>>> hitting significant numerical errors.
>>>>>>
>>>>>
>>>> I'm confused, the example you posted doesn't have significant errors...
>>>>
>>>
>>> Hmm, "errors" was the wrong word. "Differences between methods" might be
>>> more along the lines of what I was thinking. Remember, I am looking for
>>> consistency.
>>>
>>>
>>>>
>>>>
>>>>> Which has lead me down the path of using the decimal package (which
>>>>>> doesn't play very nicely with numpy because of the lack of casting rules
>>>>>> for it). Consider the following:
>>>>>> ```
>>>>>> $ cat test_fma.py
>>>>>> from __future__ import print_function
>>>>>> import numpy as np
>>>>>> res = np.float32(0.01)
>>>>>> cnt = 7001
>>>>>> x0 = np.float32(-115.0)
>>>>>> x1 = res * cnt + x0
>>>>>> print("res * cnt + x0 = %.16f" % x1)
>>>>>> x = np.arange(-115.0, -44.99 + (res / 2), 0.01, dtype='float32')
>>>>>> print("len(arange()): %d  arange()[-1]: %16f" % (len(x), x[-1]))
>>>>>> x = np.linspace(-115.0, -44.99, cnt, dtype='float32')
>>>>>> print("linspace()[-1]: %.16f" % x[-1])
>>>>>>
>>>>>> $ python test_fma.py
>>>>>> res * cnt + x0 = -44.9900015648454428
>>>>>> len(arange()): 7002  arange()[-1]:       -44.975044
>>>>>> linspace()[-1]: -44.9900016784667969
>>>>>> ```
>>>>>> arange just produces silly results (puts out an extra element...
>>>>>> adding half of the resolution is typically mentioned as a solution on
>>>>>> mailing lists to get around arange()'s limitations -- I personally don't do
>>>>>> this).
>>>>>>
>>>>>
>>>> The real solution is "don't do that" arange is not the right tool for
>>>> the job.
>>>>
>>>
>>> Well, it isn't the right tool because as far as I am concerned, it is
>>> useless for anything but integers. Why not fix it to be more suitable for
>>> floating point?
>>>
>>>
>>>>
>>>> Then there is this:
>>>>
>>>> res * cnt + x0 = -44.9900015648454428
>>>> linspace()[-1]: -44.9900016784667969
>>>>
>>>> that's as good as you are ever going to get with 32 bit floats...
>>>>
>>>
>>> Consistency is the key thing. I am fine with one of those values, so
>>> long as that value is what happens no matter which way I specify my grid.
>>>
>>>
>>>>
>>>> Though I just noticed something about your numbers -- there should be a
>>>> nice even base ten delta if you have 7001 gaps -- but linspace produces N
>>>> points, not N gaps -- so maybe you want:
>>>>
>>>>
>>>> In [*17*]: l = np.linspace(-115.0, -44.99, 7002)
>>>>
>>>>
>>>> In [*18*]: l[:5]
>>>>
>>>> Out[*18*]: array([-115.  , -114.99, -114.98, -114.97, -114.96])
>>>>
>>>>
>>>> In [*19*]: l[-5:]
>>>>
>>>> Out[*19*]: array([-45.03, -45.02, -45.01, -45.  , -44.99])
>>>>
>>>>
>>>> or, in float32 -- not as pretty:
>>>>
>>>>
>>>> In [*20*]: l = np.linspace(-115.0, -44.99, 7002, dtype=np.float32)
>>>>
>>>>
>>>> In [*21*]: l[:5]
>>>>
>>>> Out[*21*]:
>>>>
>>>> array([-115.        , -114.98999786, -114.98000336, -114.97000122,
>>>>
>>>>        -114.95999908], dtype=float32)
>>>>
>>>>
>>>> In [*22*]: l[-5:]
>>>>
>>>> Out[*22*]: array([-45.02999878, -45.02000046, -45.00999832, -45.
>>>>   , -44.99000168], dtype=float32)
>>>>
>>>>
>>>> but still as good as you get with float32, and exactly the same result
>>>> as computing in float64 and converting:
>>>>
>>>>
>>>>
>>>> In [*25*]: l = np.linspace(-115.0, -44.99, 7002).astype(np.float32)
>>>>
>>>>
>>>> In [*26*]: l[:5]
>>>>
>>>> Out[*26*]:
>>>>
>>>> array([-115.        , -114.98999786, -114.98000336, -114.97000122,
>>>>
>>>>        -114.95999908], dtype=float32)
>>>>
>>>>
>>>> In [*27*]: l[-5:]
>>>>
>>>> Out[*27*]: array([-45.02999878, -45.02000046, -45.00999832, -45.
>>>>   , -44.99000168], dtype=float32)
>>>>
>>>
>>> Argh! I got myself mixed up between specifying pixel corners versus
>>> pixel centers. rasterio has been messing me up on this.
>>>
>>>
>>>>
>>>>
>>>>>> So, does it make any sense to improve arange by utilizing fma() under
>>>>>> the hood?
>>>>>>
>>>>>
>>>> no -- this is simply not the right use-case for arange() anyway.
>>>>
>>>
>>> arange() has accuracy problems, so why not fix it?
>>>
>>> >>> l4 = np.arange(-115, -44.99, 0.01, dtype=np.float32)
>>> >>> np.median(np.diff(l4))
>>> 0.0099945068
>>> >>> np.float32(0.01)
>>> 0.0099999998
>>>
>>> There is something significantly wrong here if arange(), which takes a
>>> resolution parameter, can't seem to produce a sequence with the proper
>>> delta.
>>>
>>>
>>>
>>>>
>>>>
>>>>> Also, any plans for making fma() available as a ufunc?
>>>>>>
>>>>>
>>>> could be nice -- especially if used internally.
>>>>
>>>>
>>>>> Notice that most of my examples required knowing the number of grid
>>>>>> points ahead of time. But what if I didn't know that? What if I just have
>>>>>> the bounds and the resolution? Then arange() is the natural fit, but as I
>>>>>> showed, its accuracy is lacking, and you have to do some sort of hack to do
>>>>>> a closed interval.
>>>>>>
>>>>>
>>>> no -- it's not -- if you have the bounds and the resolution, you have
>>>> an over-specified problem. That is:
>>>>
>>>> x_min + (n * delta_x) == x_max
>>>>
>>>> If there is ANY error in either delta_x or x_max (or x_min), then
>>>> you'll get a missmatch. which is why arange is not the answer (you can make
>>>> the algorithm a bit more accurate, I suppose but there is still fp limited
>>>> precision -- if you can't exactly represent either delta_x or x_max, then
>>>> you CAN'T use the arange() definition and expect to work consistently.
>>>>
>>>> The "right" way to do it is to compute N with: round((x_max - x_min) /
>>>> delta), and then use linspace:
>>>>
>>>> linspace(x_min, x_max, N+1)
>>>>
>>>> (note that it's too bad you need to do N+1 -- if I had to do it over
>>>> again, I'd use N as the number of "gaps" rather than the number of points
>>>> -- that's more commonly what people want, if they care at all)
>>>>
>>>> This way, you get a grid with the endpoints as exact as they can be,
>>>> and the deltas as close to each-other as they can be as well.
>>>>
>>>> maybe you can do a better algorithm in linspace to save an ULP, but
>>>> it's hard to imagine when that would matter.
>>>>
>>>
>>> Yes, it is overspecified. My problem is that different tools require
>>> different specs (ahem... rasterio/gdal), and I have gird specs coming from
>>> other sources. And I need to produce data onto the same grid so that tools
>>> like xarray won't yell at me when I am trying to do an operation between
>>> gridded data that should have the same coordinates, but are off slightly
>>> because they were computed differently for whatever reason.
>>>
>>> I guess I am crying out for some sort of tool that will help the
>>> community stop making the same mistakes. A one-stop shop that'll allow us
>>> to specify a grid in a few different ways and still produce the right
>>> thing, and even do the inverse... provide a coordinate array and get grids
>>> specs in whatever form we want. Maybe even have options for dealing with
>>> pixel corner vs. pixel centers, too? There are additional fun problems such
>>> as padding out coordinate arrays, which np.pad doesn't really do a great
>>> job with.
>>>
>>> Cheers!
>>> Ben Root
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>

improving arange()? introducing fma()?

Benjamin Root

Ralf Gommers

Chris Barker

Benjamin Root

Matthew Harrigan

Eric Wieser

Benjamin Root

Matthias Geier

Chris Barker

Nils Becker

Arnaldo Russo

Nils Becker

Chris Barker

Chris Barker

Benjamin Root

Sebastian Berg

Chris Barker

tags

participants (9)