[Numpy-discussion] improving arange()? introducing fma()?

Mon Feb 12 12:33:23 EST 2018

I think it's all been said, but a few comments:

On Sun, Feb 11, 2018 at 2:19 PM, Nils Becker <nilsc.becker at gmail.com> wrote:

> Generating equidistantly spaced grids is simply not always possible.
>

exactly -- and linspace gives pretty much teh best possible result,
guaranteeing tha tthe start an end points are exact, and the spacing is
within an ULP or two (maybe we could make that within 1 ULP always, but not
sure that's worth it).

> The reason is that the absolute spacing of the possible floating point
> numbers depends on their magnitude [1].
>

Also that the exact spacing may not be exactly representable in FP -- so
you have to have at least one space that's a bit off to get the end points
right (or have the endpoints not exact).

> If you - for some reason - want the same grid spacing everywhere you may
> choose an appropriate new spacing.
>

well, yeah, but usually you are trying to fit to some other constraint. I'm
still confused as to where these couple of ULPs actually cause problems,
unless you are doing in appropriate FP comparisons elsewhere.

Curiously, either by design or accident, arange() seems to do something
> similar as was mentioned by Eric. It creates a new grid spacing by adding
> and subtracting the starting point of the grid. This often has similar
> effect as adding and subtracting N*dx (e.g. if the grid is symmetric around
> 0.0). Consequently, arange() seems to trade keeping the grid spacing
> constant for a larger error in the grid size and consequently in the end
> point.
>

interesting -- but it actually makes sense -- that is the definition of
arange(), borrowed from range(), which was designed for integers, and, in
fact, pretty much mirroered the classic C index for loop:

for (int i=0; i<N; i++) {
...

or in python:

i = start
while i < stop:
    i += step

The problem here is that termination criteria -- i < stop -- that is the
definition of the function, and works just fine for integers (where it came
from), but with FP, even with no error accumulation, stop may not be
exactly representable, so you could end up with a value for your last item
that is about (stop-step), or you could end up with a value that is a
couple ULPs less than step -- essentially including the end point when you
weren't supposed to.

The truth is, making a floating point range() was simply a bad idea to
begin with -- it's not the way to define a range of numbers in floating
point. Whiuch is why the docs now say "When using a non-integer step, such
as 0.1, the results will often not
be consistent.  It is better to use ``linspace`` for these cases."

Ben wants a way to define grids in a consistent way -- make sense. And yes,
sometimes, the original source you are trying to match (like GDAL) provides
a starting point and step. But with FP, that is simply problematic. If:

start + step*num_steps != stop

exactly in floating point, then you'll need to do the math one way or
another to get what you want -- and I'm not sure anyone but the user knows
what they want -- do you want step to be as exact as possible, or do you
want stop to be as exact as possible?

All that being said -- if arange() could be made a tiny bit more accurate
with fma or other numerical technique, why not? it won't solve the problem,
but if someone writes and tests the code (and it does not require compiler
or hardware features that aren't supported everywhere numpy compiles), then
sure. (Same for linspace, though I'm not sure it's possible)

There is one other option: a new function (or option) that makes a grid
from a specification of: start, step, num_points. If that is really a
common use case (that is, you don't care exactly what the end-point is),
then it might be handy to have it as a utility.

We could also have an arange-like function that, rather than < stop, would
do "close to" stop. Someone that understands FP better than I might be able
to compute what the expected error might be, and find the closest end point
within that error. But I think that's a bad specification -- (stop - start)
/ step may be nowhere near an integer -- then what is the function supposed
to do??

BTW: I kind of wish that linspace specified the number of steps, rather
than the number of points, that is (num+points - 1) that would save a
little bit of logical thinking. So if something new is made, let's keep
that in mind.

> 1. Comparison to calculations with decimal can be difficult as not all
> simple decimal step sizes are exactly representable as
>
finite floating point numbers.
>

yeah, this is what I mean by inappropriate use of Decimal -- decimal is not
inherently "more accurate" than fp -- is just can represent _decimal_
numbers exactly, which we are all use to -- we want  1 / 10 to be exact,
but dont mind that 1 / 3 isn't.

Decimal also provided variable precision -- so it can be handy for that. I
kinda wish Python had an arbitrary precision binary floating point built
in...

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180212/1e2ab11a/attachment.html>