[Numpy-discussion] improving arange()? introducing fma()?

Sat Feb 10 05:39:33 EST 2018

I just want to add a few links related to the topic:

https://mail.python.org/pipermail/numpy-discussion/2007-September/029129.html
https://quantumwise.com/forum/index.php?topic=110.0#.VIVgyIctjRZ
https://mail.python.org/pipermail/numpy-discussion/2012-February/060238.html
http://nbviewer.jupyter.org/github/mgeier/python-audio/blob/master/misc/arange.ipynb

cheers,
Matthias

On Fri, Feb 9, 2018 at 11:17 PM, Benjamin Root <ben.v.root at gmail.com> wrote:

> Interesting...
>
> ```
> static void
> @NAME at _fill(@type@ *buffer, npy_intp length, void *NPY_UNUSED(ignored))
> {
> npy_intp i;
> @type@ start = buffer[0];
> @type@ delta = buffer[1];
> delta -= start;
> for (i = 2; i < length; ++i) {
> buffer[i] = start + i*delta;
> }
> }
> ```
>
> So, the second value is computed using the delta arange was given, but
> then tries to get the delta back, which incurs errors:
> ```
> >>> a = np.float32(-115)
> >>> delta = np.float32(0.01)
> >>> b = a + delta
> >>> new_delta = b - a
> >>> "%.16f" % delta
> '0.0099999997764826'
> >>> "%.16f" % new_delta
> '0.0100021362304688'
> ```
>
> Also, right there is a good example of where the use of fma() could be of
> value.
>
> Cheers!
> Ben Root
>
>
> On Fri, Feb 9, 2018 at 4:56 PM, Eric Wieser <wieser.eric+numpy at gmail.com>
> wrote:
>
>> Can’t arange and linspace operations with floats be done internally
>>
>> Yes, and they probably should be - they’re done this way as a hack
>> because the api exposed for custom dtypes is here
>> <https://github.com/numpy/numpy/blob/81e15e812574d956fcc304c3982e2b59aa18aafb/numpy/core/include/numpy/ndarraytypes.h#L507-L511>,
>> (example implementation here
>> <https://github.com/numpy/numpy/blob/81e15e812574d956fcc304c3982e2b59aa18aafb/numpy/core/src/multiarray/arraytypes.c.src#L3711-L3721>)
>> - essentially, you give it the first two elements of the array, and ask it
>> to fill in the rest.
>> 
>>
>> On Fri, 9 Feb 2018 at 13:17 Matthew Harrigan <harrigan.matthew at gmail.com>
>> wrote:
>>
>>> I apologize if I'm missing something basic, but why are floats being
>>> accumulated in the first place?  Can't arange and linspace operations with
>>> floats be done internally similar to `start + np.arange(num_steps) *
>>> step_size`?  I.e. always accumulate (really increment) integers to limit
>>> errors.
>>>
>>> On Fri, Feb 9, 2018 at 3:43 PM, Benjamin Root <ben.v.root at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Feb 9, 2018 at 12:19 PM, Chris Barker <chris.barker at noaa.gov>
>>>> wrote:
>>>>
>>>>> On Wed, Feb 7, 2018 at 12:09 AM, Ralf Gommers <ralf.gommers at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>  It is partly a plea for some development of numerically accurate
>>>>>>> functions for computing lat/lon grids from a combination of inputs: bounds,
>>>>>>> counts, and resolutions.
>>>>>>>
>>>>>>
>>>>> Can you be more specific about what problems you've run into -- I work
>>>>> with lat-lon grids all the time, and have never had a problem.
>>>>>
>>>>> float32 degrees gives you about 1 meter accuracy or better, so I can
>>>>> see how losing a few digits might be an issue, though I would argue that
>>>>> you maybe shouldn't use float32 if you are worried about anything close to
>>>>> 1m accuracy... -- or shift to a relative coordinate system of some sort.
>>>>>
>>>>
>>>> The issue isn't so much the accuracy of the coordinates themselves. I
>>>> am only worried about 1km resolution (which is approximately 0.01 degrees
>>>> at mid-latitudes). My concern is with consistent *construction* of a
>>>> coordinate grid with even spacing. As it stands right now. If I provide a
>>>> corner coordinate, a resolution, and the number of pixels, the result is
>>>> not terrible (indeed, this is the approach used by gdal/rasterio). If I
>>>> have start/end coordinates and the number of pixels, the result is not bad,
>>>> either (use linspace). But, if I have start/end coordinates and a
>>>> resolution, then determining the number of pixels from that is actually
>>>> tricky to get right in the general case, especially with float32 and large
>>>> grids, and especially if the bounding box specified isn't exactly divisible
>>>> by the resolution.
>>>>
>>>>
>>>>>
>>>>> I have been playing around with the decimal package a bit lately,
>>>>>>>
>>>>>>
>>>>> sigh. decimal is so often looked at a solution to a problem it isn't
>>>>> designed for. lat-lon is natively Sexagesimal -- maybe we need that dtype
>>>>> :-)
>>>>>
>>>>> what you get from decimal is variable precision -- maybe a binary
>>>>> variable precision lib is a better answer -- that would be a good thing to
>>>>> have easy access to in numpy, but in this case, if you want better accuracy
>>>>> in a computation that will end up in float32, just use float64.
>>>>>
>>>>
>>>> I am not concerned about computing distances or anything like that, I
>>>> am trying to properly construct my grid. I need consistent results
>>>> regardless of which way the grid is specified (start/end/count,
>>>> start/res/count, start/end/res). I have found that loading up the grid
>>>> specs (using in a config file or command-line) using the Decimal class
>>>> allows me to exactly and consistently represent the grid specification, and
>>>> gets me most of the way there. But the problems with arange() is
>>>> frustrating, and I have to have extra logic to go around that and over to
>>>> linspace() instead.
>>>>
>>>>
>>>>>
>>>>> and I discovered the concept of "fused multiply-add" operations for
>>>>>>> improved accuracy. I have come to realize that fma operations could be used
>>>>>>> to greatly improve the accuracy of linspace() and arange().
>>>>>>>
>>>>>>
>>>>> arange() is problematic for non-integer use anyway, by its very
>>>>> definition (getting the "end point" correct requires the right step, even
>>>>> without FP error).
>>>>>
>>>>> and would it really help with linspace? it's computing a delta with
>>>>> one division in fp, then multiplying it by an integer (represented in fp --
>>>>> why? why not keep that an integer till the multiply?).
>>>>>
>>>>
>>>> Sorry, that was a left-over from a previous draft of my email after I
>>>> discovered that linspace's accuracy was on par with fma(). And while
>>>> arange() has inherent problems, it can still be made better than it is now.
>>>> In fact, I haven't investigated this, but I did recently discover some unit
>>>> tests of mine started to fail after a numpy upgrade, and traced it back to
>>>> a reduction in the accuracy of a usage of arange() with float32s. So,
>>>> something got worse at some point, which means we could still get accuracy
>>>> back if we can figure out what changed.
>>>>
>>>>
>>>>>
>>>>> In particular, I have been needing improved results for computing
>>>>>>> latitude/longitude grids, which tend to be done in float32's to save memory
>>>>>>> (at least, this is true in data I come across).
>>>>>>>
>>>>>>
>>>>>> If you care about saving memory *and* accuracy, wouldn't it make more
>>>>>> sense to do your computations in float64, and convert to float32 at the
>>>>>> end?
>>>>>>
>>>>>
>>>>> that does seem to be the easy option :-)
>>>>>
>>>>
>>>> Kinda missing the point, isn't it? Isn't that like saying "convert all
>>>> your data to float64s prior to calling np.mean()"? That's ridiculous.
>>>> Instead, we made np.mean() upcast the inner-loop operation, and even allow
>>>> an option to specify what the dtype that should be used for the aggregator.
>>>>
>>>>
>>>>>
>>>>>
>>>>>> Now, to the crux of my problem. It is next to impossible to generate
>>>>>>> a non-trivial numpy array of coordinates, even in double precision, without
>>>>>>> hitting significant numerical errors.
>>>>>>>
>>>>>>
>>>>> I'm confused, the example you posted doesn't have significant errors...
>>>>>
>>>>
>>>> Hmm, "errors" was the wrong word. "Differences between methods" might
>>>> be more along the lines of what I was thinking. Remember, I am looking for
>>>> consistency.
>>>>
>>>>
>>>>>
>>>>>
>>>>>> Which has lead me down the path of using the decimal package (which
>>>>>>> doesn't play very nicely with numpy because of the lack of casting rules
>>>>>>> for it). Consider the following:
>>>>>>> ```
>>>>>>> $ cat test_fma.py
>>>>>>> from __future__ import print_function
>>>>>>> import numpy as np
>>>>>>> res = np.float32(0.01)
>>>>>>> cnt = 7001
>>>>>>> x0 = np.float32(-115.0)
>>>>>>> x1 = res * cnt + x0
>>>>>>> print("res * cnt + x0 = %.16f" % x1)
>>>>>>> x = np.arange(-115.0, -44.99 + (res / 2), 0.01, dtype='float32')
>>>>>>> print("len(arange()): %d  arange()[-1]: %16f" % (len(x), x[-1]))
>>>>>>> x = np.linspace(-115.0, -44.99, cnt, dtype='float32')
>>>>>>> print("linspace()[-1]: %.16f" % x[-1])
>>>>>>>
>>>>>>> $ python test_fma.py
>>>>>>> res * cnt + x0 = -44.9900015648454428
>>>>>>> len(arange()): 7002  arange()[-1]:       -44.975044
>>>>>>> linspace()[-1]: -44.9900016784667969
>>>>>>> ```
>>>>>>> arange just produces silly results (puts out an extra element...
>>>>>>> adding half of the resolution is typically mentioned as a solution on
>>>>>>> mailing lists to get around arange()'s limitations -- I personally don't do
>>>>>>> this).
>>>>>>>
>>>>>>
>>>>> The real solution is "don't do that" arange is not the right tool for
>>>>> the job.
>>>>>
>>>>
>>>> Well, it isn't the right tool because as far as I am concerned, it is
>>>> useless for anything but integers. Why not fix it to be more suitable for
>>>> floating point?
>>>>
>>>>
>>>>>
>>>>> Then there is this:
>>>>>
>>>>> res * cnt + x0 = -44.9900015648454428
>>>>> linspace()[-1]: -44.9900016784667969
>>>>>
>>>>> that's as good as you are ever going to get with 32 bit floats...
>>>>>
>>>>
>>>> Consistency is the key thing. I am fine with one of those values, so
>>>> long as that value is what happens no matter which way I specify my grid.
>>>>
>>>>
>>>>>
>>>>> Though I just noticed something about your numbers -- there should be
>>>>> a nice even base ten delta if you have 7001 gaps -- but linspace produces N
>>>>> points, not N gaps -- so maybe you want:
>>>>>
>>>>>
>>>>> In [*17*]: l = np.linspace(-115.0, -44.99, 7002)
>>>>>
>>>>>
>>>>> In [*18*]: l[:5]
>>>>>
>>>>> Out[*18*]: array([-115.  , -114.99, -114.98, -114.97, -114.96])
>>>>>
>>>>>
>>>>> In [*19*]: l[-5:]
>>>>>
>>>>> Out[*19*]: array([-45.03, -45.02, -45.01, -45.  , -44.99])
>>>>>
>>>>>
>>>>> or, in float32 -- not as pretty:
>>>>>
>>>>>
>>>>> In [*20*]: l = np.linspace(-115.0, -44.99, 7002, dtype=np.float32)
>>>>>
>>>>>
>>>>> In [*21*]: l[:5]
>>>>>
>>>>> Out[*21*]:
>>>>>
>>>>> array([-115.        , -114.98999786, -114.98000336, -114.97000122,
>>>>>
>>>>>        -114.95999908], dtype=float32)
>>>>>
>>>>>
>>>>> In [*22*]: l[-5:]
>>>>>
>>>>> Out[*22*]: array([-45.02999878, -45.02000046, -45.00999832, -45.
>>>>>   , -44.99000168], dtype=float32)
>>>>>
>>>>>
>>>>> but still as good as you get with float32, and exactly the same result
>>>>> as computing in float64 and converting:
>>>>>
>>>>>
>>>>>
>>>>> In [*25*]: l = np.linspace(-115.0, -44.99, 7002).astype(np.float32)
>>>>>
>>>>>
>>>>> In [*26*]: l[:5]
>>>>>
>>>>> Out[*26*]:
>>>>>
>>>>> array([-115.        , -114.98999786, -114.98000336, -114.97000122,
>>>>>
>>>>>        -114.95999908], dtype=float32)
>>>>>
>>>>>
>>>>> In [*27*]: l[-5:]
>>>>>
>>>>> Out[*27*]: array([-45.02999878, -45.02000046, -45.00999832, -45.
>>>>>   , -44.99000168], dtype=float32)
>>>>>
>>>>
>>>> Argh! I got myself mixed up between specifying pixel corners versus
>>>> pixel centers. rasterio has been messing me up on this.
>>>>
>>>>
>>>>>
>>>>>
>>>>>>> So, does it make any sense to improve arange by utilizing fma()
>>>>>>> under the hood?
>>>>>>>
>>>>>>
>>>>> no -- this is simply not the right use-case for arange() anyway.
>>>>>
>>>>
>>>> arange() has accuracy problems, so why not fix it?
>>>>
>>>> >>> l4 = np.arange(-115, -44.99, 0.01, dtype=np.float32)
>>>> >>> np.median(np.diff(l4))
>>>> 0.0099945068
>>>> >>> np.float32(0.01)
>>>> 0.0099999998
>>>>
>>>> There is something significantly wrong here if arange(), which takes a
>>>> resolution parameter, can't seem to produce a sequence with the proper
>>>> delta.
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>> Also, any plans for making fma() available as a ufunc?
>>>>>>>
>>>>>>
>>>>> could be nice -- especially if used internally.
>>>>>
>>>>>
>>>>>> Notice that most of my examples required knowing the number of grid
>>>>>>> points ahead of time. But what if I didn't know that? What if I just have
>>>>>>> the bounds and the resolution? Then arange() is the natural fit, but as I
>>>>>>> showed, its accuracy is lacking, and you have to do some sort of hack to do
>>>>>>> a closed interval.
>>>>>>>
>>>>>>
>>>>> no -- it's not -- if you have the bounds and the resolution, you have
>>>>> an over-specified problem. That is:
>>>>>
>>>>> x_min + (n * delta_x) == x_max
>>>>>
>>>>> If there is ANY error in either delta_x or x_max (or x_min), then
>>>>> you'll get a missmatch. which is why arange is not the answer (you can make
>>>>> the algorithm a bit more accurate, I suppose but there is still fp limited
>>>>> precision -- if you can't exactly represent either delta_x or x_max, then
>>>>> you CAN'T use the arange() definition and expect to work consistently.
>>>>>
>>>>> The "right" way to do it is to compute N with: round((x_max - x_min) /
>>>>> delta), and then use linspace:
>>>>>
>>>>> linspace(x_min, x_max, N+1)
>>>>>
>>>>> (note that it's too bad you need to do N+1 -- if I had to do it over
>>>>> again, I'd use N as the number of "gaps" rather than the number of points
>>>>> -- that's more commonly what people want, if they care at all)
>>>>>
>>>>> This way, you get a grid with the endpoints as exact as they can be,
>>>>> and the deltas as close to each-other as they can be as well.
>>>>>
>>>>> maybe you can do a better algorithm in linspace to save an ULP, but
>>>>> it's hard to imagine when that would matter.
>>>>>
>>>>
>>>> Yes, it is overspecified. My problem is that different tools require
>>>> different specs (ahem... rasterio/gdal), and I have gird specs coming from
>>>> other sources. And I need to produce data onto the same grid so that tools
>>>> like xarray won't yell at me when I am trying to do an operation between
>>>> gridded data that should have the same coordinates, but are off slightly
>>>> because they were computed differently for whatever reason.
>>>>
>>>> I guess I am crying out for some sort of tool that will help the
>>>> community stop making the same mistakes. A one-stop shop that'll allow us
>>>> to specify a grid in a few different ways and still produce the right
>>>> thing, and even do the inverse... provide a coordinate array and get grids
>>>> specs in whatever form we want. Maybe even have options for dealing with
>>>> pixel corner vs. pixel centers, too? There are additional fun problems such
>>>> as padding out coordinate arrays, which np.pad doesn't really do a great
>>>> job with.
>>>>
>>>> Cheers!
>>>> Ben Root
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180210/84b4e709/attachment-0001.html>