On Thu, Feb 22, 2018 at 11:57 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
> First, you are right...Decimal is not the right module for this. I
> think instead I should use the 'fractions' module for loading grid
> spec information from strings (command-line, configs, etc). The
> tricky part is getting the yaml reader to use it instead of
> converting to a float under the hood.

I'm not sure fractions is any better (Or necessary, anyway) in the end, you need floats, so the inherent limitations of floats aren't the problem. In your original use-case, you wanted a 32 bit float grid in the end, so doing the calculations is 64 bit float and then downcasting is as good as you're going to get, and easy and fast. And I suppose you could use 128 bit float if you want to get to 64 bit in the end -- not as easy, and python itself doesn't have it.

>  The
> tricky part is getting the yaml reader to use it instead of
> converting to a float under the hood.

64 bit floats support about 15 decimal digits -- are you string-based sources providing more than that?? if not, then the 64 bit float version is as good as it's going to get.

> Second, what has been pointed out about the implementation of arange
> actually helps to explain some oddities I have encountered. In some
> situations, I have found that it was better for me to produce the
> reversed sequence, and then reverse that array back and use it.

interesting -- though I'd still say "don't use arange" is the "correct" answer.
 
> Third, it would be nice to do what we can to improve arange()'s
> results. Would we be ok with a PR that uses fma() if it is available,
> but then falls back on a regular multiply and add if it isn't
> available, or are we going to need to implement it ourselves for
> consistency?

I would think calling fma() if supported would be fine -- if there is an easy macro to check if it's there. I don't know if numpy has a policy about this sort of thing, but I'm pretty sure everywhere else, the final details of computation fal back to the hardware/compiler/library (i.e. Intel used extended precision fp, other platforms don't, etc) so I can't see that having a slightly more accurate computation in arange on some platforms and not others would cause a problem. If any of the tests are checking to that level of accuracy, they should be fixed :-)

  2. It sounds *not* like a fix, but rather a
     "make it slightly less bad, but it is still awful"

exactly -- better accuracy is a good thing, but it's not the source of the problem here -- the source of the problem is inherent to FP, and/or poorly specified goal. having arrange or linspace lose a couple ULPs fewer isn't going to change anything.
 
Using fma inside linspace might make linspace a bit more exact
possible, and would be a good thing, though I am not sure we have a
policy yet for something that is only used sometimes,

see above -- nor do I, but it seems like a fine idea to me.
 
It also would be nice to add stable summation to numpy in general (in
whatever way), which maybe is half related but on nobody's specific
todo list.

I recall a conversation on this list a (long) while back about compensated summation (Kahan summation) -- I guess nothign ever came of it?

> Lastly, there definitely needs to be a better tool for grid making.
> The problem appears easy at first, but it is fraught with many
> pitfalls and subtle issues. It is easy to say, "always use
> linspace()", but if the user doesn't have the number of pixels, they
> will need to calculate that using --- gasp! -- floating point
> numbers, which could result in the wrong answer.

agreed -- this tends to be an inherently over-specified problem:

min_value
max_value
spacing
number_of_grid_spaces

That is four values, and only three independent ones.

arange() looks like it uses: min_value, max_value, spacing -- but it doesn't really (see previous discussion) so not the right tool for anything.

linspace() uses:  min_value, max_value, (number_of_grid_spaces + 1), which is about as good as you can get (except for that annoying 1).

But what if you are given min_value, spacing, number_of_grid_spaces?

Maybe we need a function for that?? (which I think would simply be:

np.arange(number_of_grid_spaces + 1) * spacing

Which is why we probably don't need a function :-) (note that that's only error of one multiplication per grid point)

Or maybe a way to take all four values, and return a "best fit" grid. The problem with that is that it's over specified, and and it may not be only fp error that makes it not fit. What should a code do???

So Ben:

What is the problem you are trying to solve?  -- I'm still confused. What information do you have to define the grid? Maybe all we need are docs for how to best compute a grid with given specifications? And point to them in the arange() and linspace() docstrings.

-CHB

I once wanted to add a "step" argument to linspace, but didn't in the
end, largely because it basically enforced in a very convoluted way
that the step fit exactly to a number of steps (up to floating point
precision) and body was quite sure it was a good idea, since it would
just be useful for a little convenience when you do not want to
calculate the steps.

exactly!

-CHB

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov