step paramter for linspace
Hi, there has been a request on the issue tracker for a step parameter to linspace. This is of course tricky with the imprecision of floating point numbers. As a trade off, I was thinking of a step parameter that is used to calculate the integer number of steps. However to be certain that it never misbehaves, doing this strict up to the numerical precision of the (float) numbers. Effectively this means: In [9]: np.linspace(0, 1.2, step=0.3) Out[9]: array([ 0. , 0.3, 0.6, 0.9, 1.2]) In [10]: np.linspace(0, 1.2+5-5, step=0.3) Out[10]: array([ 0. , 0.3, 0.6, 0.9, 1.2]) In [11]: np.linspace(0, 1.2+500-500, step=0.3) ValueError: could not determine exact number of samples for given step I.e. the last fails, because 1.2 + 500 - 500 == 1.1999999999999886, which is an error that is larger then the imprecision of floating point numbers. Is this considered useful, or as it can easily fail for calculated numbers, and is thus only a convenience, it is not? Regards, Sebastian
On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote:
there has been a request on the issue tracker for a step parameter to linspace. This is of course tricky with the imprecision of floating point numbers.
How is that different to arange? Either you specify the number of points with linspace, or you specify the step with arange. Is there a third option? My usual hack to deal with the numerical bounds issue is to add/subtract half the step. Henry
On Fri, 2013-03-01 at 12:33 +0000, Henry Gomersall wrote:
On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote:
there has been a request on the issue tracker for a step parameter to linspace. This is of course tricky with the imprecision of floating point numbers.
How is that different to arange? Either you specify the number of points with linspace, or you specify the step with arange. Is there a third option?
My usual hack to deal with the numerical bounds issue is to add/subtract half the step.
There is not much. It does that half step logic for you, and you actually know that the end point is exact (since linspace makes sure of that). In arange, the start and step are exact. In linspace the start and stop are exact (even with a given step, it would vary on the order of floating point accuracy). Maybe the larger point is the hope that by adding this to linspace it is easier to get new users to use it and avoid pitfalls of arange with floating points when you are not aware of that half step thing.
Henry
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, 2013-03-01 at 13:44 +0100, Sebastian Berg wrote:
On Fri, 2013-03-01 at 12:33 +0000, Henry Gomersall wrote:
On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote:
there has been a request on the issue tracker for a step parameter to linspace. This is of course tricky with the imprecision of floating point numbers.
How is that different to arange? Either you specify the number of points with linspace, or you specify the step with arange. Is there a third option?
My usual hack to deal with the numerical bounds issue is to add/subtract half the step.
There is not much. It does that half step logic for you, and you actually know that the end point is exact (since linspace makes sure of that).
In arange, the start and step are exact. In linspace the start and stop are exact (even with a given step, it would vary on the order of floating point accuracy).
Maybe the larger point is the hope that by adding this to linspace it is easier to get new users to use it and avoid pitfalls of arange with floating points when you are not aware of that half step thing.
That said, I am honestly not sure this is worth it. I guess I might use it once in a while, but overall probably hardly at all and it is easy to do something else...
Henry
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Mar 1, 2013 at 12:33 PM, Henry Gomersall <heng@cantab.net> wrote:
On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote:
there has been a request on the issue tracker for a step parameter to linspace. This is of course tricky with the imprecision of floating point numbers.
How is that different to arange? Either you specify the number of points with linspace, or you specify the step with arange. Is there a third option?
arange is designed for ints and gives you a half-open interval, linspace is designed for floats and gives you a closed interval. This means that when arange is used on floats, it does weird things that linspace doesn't: In [11]: eps = np.finfo(float).eps In [12]: np.arange(0, 1, step=0.2) Out[12]: array([ 0. , 0.2, 0.4, 0.6, 0.8]) In [13]: np.arange(0, 1 + eps, step=0.2) Out[13]: array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. ]) In [14]: np.linspace(0, 1, 6) Out[14]: array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. ]) In [15]: np.linspace(0, 1 + eps, 6) Out[15]: array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. ]) The half-open/closed thing also has effects on what kind of api is reasonable. arange(0, 1, step=0.8) makes perfect sense (it acts like python range(0, 10, step=8)). linspace(0, 1, step=0.8) is just incoherent, though, because linspace guarantees that both the start and end points are included.
My usual hack to deal with the numerical bounds issue is to add/subtract half the step.
Right. Which is exactly the sort of annoying, content-free code that a library is supposed to handle for you, so you can save mental energy for more important things :-). The problem is to figure out exactly how strict we should be. Like, presumably linspace(0, 1, step=0.8) should fail, rather than round 0.8 to 0.5 or 1. That would clearly violate "in the face of ambiguity, refuse the temptation to guess". OTOH, as Sebastian points out, requiring that the step be *exactly* a divisor of the value (stop - start), within 1 ULP, is probably obnoxious. Would anything bad happen if we just required that, say, (stop - start)/step had to be within "np.allclose" of an integer, i.e., to some reasonable relative and absolute precision, and then rounded the number of steps to match that integer exactly? -n
On Fri, 2013-03-01 at 13:34 +0000, Nathaniel Smith wrote:
My usual hack to deal with the numerical bounds issue is to add/subtract half the step.
Right. Which is exactly the sort of annoying, content-free code that a library is supposed to handle for you, so you can save mental energy for more important things :-).
I agree with the sentiment (I sometimes wish a library could read my mind ;) but putting this sort of logic into the library seems dangerous to me. The point is that the coder _should_ understand the subtleties of floating point numbers. IMO arange _should_ be well specified and actually operate on the half open interval; continuing to add a step until >= the limit is clear and always unambiguous. Unfortunately, the docs tell me that this isn't the case: "For floating point arguments, the length of the result is ``ceil((stop - start)/step)``. Because of floating point overflow, this rule may result in the last element of `out` being greater than `stop`." In my jet-lag addled state, i can't see when this out[-1] > stop case will occur, but I can take it as true. It does seem to be problematic though. As soon as you allow freeform setting of the stop value, problems are going to be encountered. Who's to say that the stop - delta is actually _meant_ to be below the limit, or is meant to be the limit? Certainly not the library! It just seems to me that this will lead to lots of bad code in which the writer has glossed over an ambiguous case. Henry
On 3/1/13, Henry Gomersall <heng@cantab.net> wrote:
On Fri, 2013-03-01 at 13:34 +0000, Nathaniel Smith wrote:
My usual hack to deal with the numerical bounds issue is to add/subtract half the step.
Right. Which is exactly the sort of annoying, content-free code that a library is supposed to handle for you, so you can save mental energy for more important things :-).
I agree with the sentiment (I sometimes wish a library could read my mind ;) but putting this sort of logic into the library seems dangerous to me.
The point is that the coder _should_ understand the subtleties of floating point numbers. IMO arange _should_ be well specified and actually operate on the half open interval; continuing to add a step until >= the limit is clear and always unambiguous.
Unfortunately, the docs tell me that this isn't the case: "For floating point arguments, the length of the result is ``ceil((stop - start)/step)``. Because of floating point overflow, this rule may result in the last element of `out` being greater than `stop`."
In my jet-lag addled state, i can't see when this out[-1] > stop case will occur, but I can take it as true. It does seem to be problematic though.
Here you go: In [32]: end = 2.2 In [33]: x = arange(0.1, end, 0.3) In [34]: x[-1] Out[34]: 2.2000000000000006 In [35]: x[-1] > end Out[35]: True Warren
As soon as you allow freeform setting of the stop value, problems are going to be encountered. Who's to say that the stop - delta is actually _meant_ to be below the limit, or is meant to be the limit? Certainly not the library!
It just seems to me that this will lead to lots of bad code in which the writer has glossed over an ambiguous case.
Henry
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, 2013-03-01 at 09:24 -0500, Warren Weckesser wrote:
In my jet-lag addled state, i can't see when this out[-1] > stop case will occur, but I can take it as true. It does seem to be problematic though.
Here you go:
In [32]: end = 2.2
In [33]: x = arange(0.1, end, 0.3)
Thanks! I'll assert then that there should be an equivalent for floats that unambiguously returns a range for the half open interval. IMO this is more useful than a hacky version of linspace. Henry
On Fri, 2013-03-01 at 14:32 +0000, Henry Gomersall wrote:
I'll assert then that there should be an equivalent for floats that unambiguously returns a range for the half open interval. IMO this is more useful than a hacky version of linspace.
And, no, I haven't thought carefully about how to handle a negative step. Henry
On Fri, 2013-03-01 at 10:01 -0500, Alan G Isaac wrote:
On 3/1/2013 9:32 AM, Henry Gomersall wrote:
there should be an equivalent for floats that unambiguously returns a range for the half open interval
If I've understood you: start + stepsize*np.arange(nsteps)
yes, except that nsteps is computed for you, otherwise you could just use linspace ;) hen
On Fri, 2013-03-01 at 15:07 +0000, Henry Gomersall wrote:
On Fri, 2013-03-01 at 10:01 -0500, Alan G Isaac wrote:
On 3/1/2013 9:32 AM, Henry Gomersall wrote:
there should be an equivalent for floats that unambiguously returns a range for the half open interval
If I've understood you: start + stepsize*np.arange(nsteps)
yes, except that nsteps is computed for you, otherwise you could just use linspace ;)
If you could just use linspace, you should use linspace (and give it a step argument) in my opinion, but I don't think you meant that ;). linspace holds start and stop exact and guarantees that you actually get to stop. Even a modified/new arange will never do that, but I think many use arange like that and giving linspace a step argument could migrate that usage (which is simply ill defined for arange) to it. That might give an error once in a while, but that should be much less often and much more enlightening then a sudden "one value too much". I think the accuracy requirements for the step for linspace can be relaxed enough probably, though I am not quite certain yet as to how (there is a bit of a trade off/problem when you get to a very large number of steps).
hen
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
One motivation of this thread was that adding a step parameter to linspace might make things easier for beginners. I claim this thread has put the lie to that, starting with the initial post. So what is the persuasive case for the change? Imo, the current situation is good: use arange if you want to specify the stepsize, or use linspace if you want to specify the number of points. Nice and clean. Cheers, Alan Isaac
On Fri, 2013-03-01 at 10:49 -0500, Alan G Isaac wrote:
One motivation of this thread was that adding a step parameter to linspace might make things easier for beginners.
I claim this thread has put the lie to that, starting with the initial post. So what is the persuasive case for the change?
Imo, the current situation is good: use arange if you want to specify the stepsize, or use linspace if you want to specify the number of points. Nice and clean.
Maybe you are right, and it is not easier. But there was a "please include an end_point=True/False option to arange" request, and that does not sense by arange logic. The fact that the initial example was overly strict is something that can be relaxed quite a bit I am sure, though I guess you may always have an odd case here or there with floats. I agree the difference is nice and clean right now, but I disagree that this would change much. Arange guarantees the step size. Linspace the end point. There is a bit of a shift, but if I thought it was less clean I would not have asked if it is deemed useful :). At this time it seems there is more sentiment against it and that is fine with me. I thought it might be useful for some who normally want the linspace behavior, but do not want to worry about the right num in some cases. Someone who actually wants an error if the step they put in quickly (and which they would have used to calculate num) was wrong.
Cheers, Alan Isaac _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, 2013-03-01 at 17:29 +0100, Sebastian Berg wrote:
At this time it seems there is more sentiment against it and that is fine with me. I thought it might be useful for some who normally want the linspace behavior, but do not want to worry about the right num in some cases. Someone who actually wants an error if the step they put in quickly (and which they would have used to calculate num) was wrong.
Actually, I buy this could be useful. I think it's helpful to think about the potential problems though. Henry
On Mar 1, 2013, at 8:39 AM, Henry Gomersall <heng@cantab.net> wrote:
On Fri, 2013-03-01 at 17:29 Actually, I buy this could be useful.
Yes, it could. How about a "farange", designed for floating point values -- I imagine someone smarter than me about for could write one that would guarantee that end-point was exact, and steps were within For error of exact. CHB
I think it's helpful to think about the potential problems though.
Henry
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, 2013-03-01 at 13:34 +0000, Nathaniel Smith wrote:
On Fri, Mar 1, 2013 at 12:33 PM, Henry Gomersall <heng@cantab.net> wrote:
On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote:
there has been a request on the issue tracker for a step parameter to linspace. This is of course tricky with the imprecision of floating point numbers.
How is that different to arange? Either you specify the number of points with linspace, or you specify the step with arange. Is there a third option?
arange is designed for ints and gives you a half-open interval, linspace is designed for floats and gives you a closed interval. This means that when arange is used on floats, it does weird things that linspace doesn't:
In [11]: eps = np.finfo(float).eps
In [12]: np.arange(0, 1, step=0.2) Out[12]: array([ 0. , 0.2, 0.4, 0.6, 0.8])
In [13]: np.arange(0, 1 + eps, step=0.2) Out[13]: array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. ])
In [14]: np.linspace(0, 1, 6) Out[14]: array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. ])
In [15]: np.linspace(0, 1 + eps, 6) Out[15]: array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. ])
The half-open/closed thing also has effects on what kind of api is reasonable. arange(0, 1, step=0.8) makes perfect sense (it acts like python range(0, 10, step=8)). linspace(0, 1, step=0.8) is just incoherent, though, because linspace guarantees that both the start and end points are included.
My usual hack to deal with the numerical bounds issue is to add/subtract half the step.
Right. Which is exactly the sort of annoying, content-free code that a library is supposed to handle for you, so you can save mental energy for more important things :-).
The problem is to figure out exactly how strict we should be. Like, presumably linspace(0, 1, step=0.8) should fail, rather than round 0.8 to 0.5 or 1. That would clearly violate "in the face of ambiguity, refuse the temptation to guess".
OTOH, as Sebastian points out, requiring that the step be *exactly* a divisor of the value (stop - start), within 1 ULP, is probably obnoxious.
Would anything bad happen if we just required that, say, (stop - start)/step had to be within "np.allclose" of an integer, i.e., to some reasonable relative and absolute precision, and then rounded the number of steps to match that integer exactly?
I was a bit worried about what happens for huge a number of steps. Have to rethink a bit about it, but I guess one should be able to relax it... or maybe someone here has a nice idea on how to relax it. It seems to me that there is a bit of a trade off if you get into the millions of steps range, because absolute errors that make sense for few steps are suddenly in the range integers.
-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (6)
-
Alan G Isaac
-
Chris Barker - NOAA Federal
-
Henry Gomersall
-
Nathaniel Smith
-
Sebastian Berg
-
Warren Weckesser