[Numpy-discussion] improving arange()? introducing fma()?

Thu Feb 22 14:02:29 EST 2018

@Ben: Have you found a solution to your problem? Are there thinks we could
do in numpy to make it better?

-CHB

On Mon, Feb 12, 2018 at 9:33 AM, Chris Barker <chris.barker at noaa.gov> wrote:

> I think it's all been said, but a few comments:
>
> On Sun, Feb 11, 2018 at 2:19 PM, Nils Becker <nilsc.becker at gmail.com>
> wrote:
>
>> Generating equidistantly spaced grids is simply not always possible.
>>
>
> exactly -- and linspace gives pretty much teh best possible result,
> guaranteeing tha tthe start an end points are exact, and the spacing is
> within an ULP or two (maybe we could make that within 1 ULP always, but not
> sure that's worth it).
>
>
>> The reason is that the absolute spacing of the possible floating point
>> numbers depends on their magnitude [1].
>>
>
> Also that the exact spacing may not be exactly representable in FP -- so
> you have to have at least one space that's a bit off to get the end points
> right (or have the endpoints not exact).
>
>
>> If you - for some reason - want the same grid spacing everywhere you may
>> choose an appropriate new spacing.
>>
>
> well, yeah, but usually you are trying to fit to some other constraint.
> I'm still confused as to where these couple of ULPs actually cause
> problems, unless you are doing in appropriate FP comparisons elsewhere.
>
> Curiously, either by design or accident, arange() seems to do something
>> similar as was mentioned by Eric. It creates a new grid spacing by adding
>> and subtracting the starting point of the grid. This often has similar
>> effect as adding and subtracting N*dx (e.g. if the grid is symmetric around
>> 0.0). Consequently, arange() seems to trade keeping the grid spacing
>> constant for a larger error in the grid size and consequently in the end
>> point.
>>
>
> interesting -- but it actually makes sense -- that is the definition of
> arange(), borrowed from range(), which was designed for integers, and, in
> fact, pretty much mirroered the classic C index for loop:
>
>
> for (int i=0; i<N; i++) {
> ...
>
>
> or in python:
>
> i = start
> while i < stop:
>     i += step
>
> The problem here is that termination criteria -- i < stop -- that is the
> definition of the function, and works just fine for integers (where it came
> from), but with FP, even with no error accumulation, stop may not be
> exactly representable, so you could end up with a value for your last item
> that is about (stop-step), or you could end up with a value that is a
> couple ULPs less than step -- essentially including the end point when you
> weren't supposed to.
>
> The truth is, making a floating point range() was simply a bad idea to
> begin with -- it's not the way to define a range of numbers in floating
> point. Whiuch is why the docs now say "When using a non-integer step, such
> as 0.1, the results will often not
> be consistent.  It is better to use ``linspace`` for these cases."
>
> Ben wants a way to define grids in a consistent way -- make sense. And
> yes, sometimes, the original source you are trying to match (like GDAL)
> provides a starting point and step. But with FP, that is simply
> problematic. If:
>
> start + step*num_steps != stop
>
> exactly in floating point, then you'll need to do the math one way or
> another to get what you want -- and I'm not sure anyone but the user knows
> what they want -- do you want step to be as exact as possible, or do you
> want stop to be as exact as possible?
>
> All that being said -- if arange() could be made a tiny bit more accurate
> with fma or other numerical technique, why not? it won't solve the
> problem, but if someone writes and tests the code (and it does not require
> compiler or hardware features that aren't supported everywhere numpy
> compiles), then sure. (Same for linspace, though I'm not sure it's possible)
>
> There is one other option: a new function (or option) that makes a grid
> from a specification of: start, step, num_points. If that is really a
> common use case (that is, you don't care exactly what the end-point is),
> then it might be handy to have it as a utility.
>
> We could also have an arange-like function that, rather than < stop, would
> do "close to" stop. Someone that understands FP better than I might be able
> to compute what the expected error might be, and find the closest end point
> within that error. But I think that's a bad specification -- (stop - start)
> / step may be nowhere near an integer -- then what is the function supposed
> to do??
>
>
> BTW: I kind of wish that linspace specified the number of steps, rather
> than the number of points, that is (num+points - 1) that would save a
> little bit of logical thinking. So if something new is made, let's keep
> that in mind.
>
>
>
>> 1. Comparison to calculations with decimal can be difficult as not all
>> simple decimal step sizes are exactly representable as
>>
> finite floating point numbers.
>>
>
> yeah, this is what I mean by inappropriate use of Decimal -- decimal is
> not inherently "more accurate" than fp -- is just can represent _decimal_
> numbers exactly, which we are all use to -- we want  1 / 10 to be exact,
> but dont mind that 1 / 3 isn't.
>
> Decimal also provided variable precision -- so it can be handy for that. I
> kinda wish Python had an arbitrary precision binary floating point built
> in...
>
> -CHB
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
>

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180222/033fc6b2/attachment.html>