
Hi everyone, This was reported yesterday as a bug in Debian's numpy package:
len(numpy.arange(0, 0.6, 0.1)) == len(numpy.arange(0, 0.4+0.2, 0.1)) False
The cause is this:
ceil((0.4+0.2)/0.1) 7.0
ceil(0.6/0.1) 6.0
which holds for both numpy's and the standard library's ceil(). Using arange in this way is a fundamentally unreliable thing to do, but is there anything we want to do about this? Should numpy emit a warning when using arange with floating point values when (stop-start)/step is close to an integer? -- Ed

this is really annoying. Matlab handles the "ceil" weirdness quite well, though. --------------------------------------------------------------
ceil(0.6/0.1)
ans = 6
ceil((0.4+0.2)/0.1)
ans = 7
0:0.1:0.6
ans = 0 1.000000000000000e-001 2.000000000000000e-001 3.000000000000000e-001 4.000000000000000e-001 5.000000000000000e-001 6.000000000000000e-001
0:0.1:(0.4+0.2)
ans = 0 1.000000000000000e-001 2.000000000000000e-001 3.000000000000000e-001 4.000000000000001e-001 5.000000000000001e-001 6.000000000000001e-001
length(0:0.1:0.6) == length(0:0.1:(0.4+0.2))
ans = 1 -------------------------------------------------------------- hth, L. On 9/14/07, Ed Schofield <edschofield@gmail.com> wrote:
Hi everyone,
This was reported yesterday as a bug in Debian's numpy package:
len(numpy.arange(0, 0.6, 0.1)) == len(numpy.arange(0, 0.4+0.2, 0.1)) False
The cause is this:
ceil((0.4+0.2)/0.1) 7.0
ceil(0.6/0.1) 6.0
which holds for both numpy's and the standard library's ceil().
Using arange in this way is a fundamentally unreliable thing to do, but is there anything we want to do about this? Should numpy emit a warning when using arange with floating point values when (stop-start)/step is close to an integer?
-- Ed _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

On 9/14/07, lorenzo bolla <lbolla@gmail.com> wrote:
this is really annoying. Matlab handles the "ceil" weirdness quite well, though.
--------------------------------------------------------------
ceil(0.6/0.1)
ans =
6
ceil((0.4+0.2)/0.1)
ans =
7
0:0.1:0.6
ans =
0 1.000000000000000e-001 2.000000000000000e-001 3.000000000000000e-001 4.000000000000000e-001 5.000000000000000e-001 6.000000000000000e-001
0:0.1:(0.4+0.2)
ans =
0 1.000000000000000e-001 2.000000000000000e-001 3.000000000000000e-001 4.000000000000001e-001 5.000000000000001e-001 6.000000000000001e-001
Well, in Matlab the end point is specified and the result of the division is probably rounded, so in order to have problems you might need to use something like .55 as the endpoint. In Numpy's arange an upper bound is used instead, so roundoff is a problem, but the 5.5 case would be handled easily. Chuck

Might using min(ceil((stop-start)/step), ceil((stop-start)/step-r)) with r = finfo(double).resolution instead of ceil((stop-start)/step) perhaps be useful? Joris On 14 Sep 2007, at 11:37, Ed Schofield wrote:
Hi everyone,
This was reported yesterday as a bug in Debian's numpy package:
len(numpy.arange(0, 0.6, 0.1)) == len(numpy.arange(0, 0.4+0.2, 0.1)) False
The cause is this:
ceil((0.4+0.2)/0.1) 7.0
ceil(0.6/0.1) 6.0
which holds for both numpy's and the standard library's ceil().
Using arange in this way is a fundamentally unreliable thing to do, but is there anything we want to do about this? Should numpy emit a warning when using arange with floating point values when (stop-start)/step is close to an integer?
-- Ed
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

I thought this is what the linspace function was written for in numpy. Why not use that? It works just like you would want always including the final point. --- Joris De Ridder <Joris.DeRidder@ster.kuleuven.be> wrote:
Might using
min(ceil((stop-start)/step), ceil((stop-start)/step-r))
with r = finfo(double).resolution instead of ceil((stop-start)/step) perhaps be useful?
Joris
-- Lou Pecora, my views are my own. ____________________________________________________________________________________ Catch up on fall's hot new shows on Yahoo! TV. Watch previews, get listings, and more! http://tv.yahoo.com/collections/3658

On 14 Sep 2007, at 15:54, Lou Pecora wrote:
I thought this is what the linspace function was written for in numpy. Why not use that?
AFAIK, linspace() is written to generate N evenly spaced numbers between start and stop inclusive. Similar but not quite the same as arange().
It works just like you would want always including the final point.
The example I gave was actually meant to _avoid_ inclusion of the last point. E.g. In [93]: arange(0.0, 0.4+0.2, 0.1) Out[93]: array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6]) In [94]: myrange(0.0, 0.4+0.2, 0.1) Out[94]: array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5]) where myrange() is an ad hoc replacement for arange(): def myrange(start, stop, step): r = finfo(double).resolution N = min(ceil((stop-start)/step), ceil((stop-start)/step-r)) return start + arange(N) * step I'm not 100% sure that the above version of myrange() wouldn't generate surprising results in some cases. If it doesn't, why not include it in (the C-version of) arange()? I don't think users actually count on the inclusion of the end point in some cases, so it would not break code. It would, however, avoid some surprises from time to time. From the example of Lorenzo, it seems that Matlab is always including the endpoint. How exactly is their arange version defined? Joris

Ed Schofield wrote:
Hi everyone,
This was reported yesterday as a bug in Debian's numpy package:
len(numpy.arange(0, 0.6, 0.1)) == len(numpy.arange(0, 0.4+0.2, 0.1)) False
The cause is this:
ceil((0.4+0.2)/0.1) 7.0
ceil(0.6/0.1) 6.0
which holds for both numpy's and the standard library's ceil().
0.6 == (0.4+0.2) False
Consequently, not a bug.
Using arange in this way is a fundamentally unreliable thing to do, but is there anything we want to do about this?
Tell people to use linspace(). Yes, it does a slightly different thing; that's why it works. Most uses of floating point arange() can be cast using linspace() more reliably. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On 14/09/2007, Robert Kern <robert.kern@gmail.com> wrote:
Ed Schofield wrote:
Using arange in this way is a fundamentally unreliable thing to do, but is there anything we want to do about this?
Tell people to use linspace(). Yes, it does a slightly different thing; that's why it works. Most uses of floating point arange() can be cast using linspace() more reliably.
I would like to point out in particular that numpy's linspace can leave out the last point (something I often want to do): Definition: linspace(start, stop, num=50, endpoint=True, retstep=False) Docstring: Return evenly spaced numbers. Return num evenly spaced samples from start to stop. If endpoint is True, the last sample is stop. If retstep is True then return the step value used. This is one of those cases where "from pylab import *" is going to bite you, though, because its linspace doesn't. You can always fake it with linspace(a,b,N+1)[:-1]. Anne

the question is how to reduce user astonishment.
IMHO this is exactly the point. There seems to be two questions here: 1) do we want to reduce user astonishment, and 2) if yes, how could we do this? Not everyone seems to be convinced of the first question, replying that in many cases linspace() could well replace arange(). In many cases, yes, but not all. For some cases arange() has its legitimate use, even for floating point, and in these cases you may get bitten by the inexact number representation. If Matlab seems to be able to avoid surprises, why not numpy? Joris Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

On 9/14/07, Joris De Ridder <Joris.DeRidder@ster.kuleuven.be> wrote:
the question is how to reduce user astonishment.
IMHO this is exactly the point. There seems to be two questions here: 1) do we want to reduce user astonishment, and 2) if yes, how could we do this? Not everyone seems to be convinced of the first question, replying that in many cases linspace() could well replace arange(). In many cases, yes, but not all. For some cases arange() has its legitimate use, even for floating point, and in these cases you may get bitten by the inexact number representation. If Matlab seems to be able to avoid surprises, why not numpy?
Perhaps because it's a bad idea? This case may be different, but in general in cases where you try to sweep the surprising nature of floating point under the rug, you are never entirely successful. The end result is that, although surprises crop up with less regularity, they are much, much harder to diagnose and understand when they do crop up. If arange can be "fixed" in a way that's easy to understand, then great. However, if the algorithm for deciding the points is anything but dirt simple, leave it alone. Or, perhaps, deprecate floating point values as arguments. I'm not very convinced by the arguments advanced thus far that arange with floating point has legitimate uses. I've certainly used it this way myself, but I believe that all of my uses could easily be replaced with either linspace or arange with integer arguments. I suspect that cases where the exact properties of arange are required are far between and it's easy enough to simulate the current behaviour if needed. An advantage to that is that the potential pitfalls become obvious when you roll your own version. -- . __ . |-\ . . tim.hochberg@ieee.org

On 9/14/07, Timothy Hochberg <tim.hochberg@ieee.org> wrote:
On 9/14/07, Joris De Ridder <Joris.DeRidder@ster.kuleuven.be> wrote:
the question is how to reduce user astonishment.
IMHO this is exactly the point. There seems to be two questions here: 1) do we want to reduce user astonishment, and 2) if yes, how could we do this? Not everyone seems to be convinced of the first question, replying that in many cases linspace() could well replace arange(). In many cases, yes, but not all. For some cases arange() has its legitimate use, even for floating point, and in these cases you may get bitten by the inexact number representation. If Matlab seems to be able to avoid surprises, why not numpy?
Perhaps because it's a bad idea? This case may be different, but in general in cases where you try to sweep the surprising nature of floating point under the rug, you are never entirely successful. The end result is that, although surprises crop up with less regularity, they are much, much harder to diagnose and understand when they do crop up.
Exactly. The problem becomes even more dependent on particular circumstance. For instance, if (.2 + .2 + .1) is used instead of (.2 + .4). If arange can be "fixed" in a way that's easy to understand, then great.
However, if the algorithm for deciding the points is anything but dirt simple, leave it alone. Or, perhaps, deprecate floating point values as arguments. I'm not very convinced by the arguments advanced thus far that arange with floating point has legitimate uses. I've certainly used it this way myself, but I believe that all of my uses could easily be replaced with either linspace or arange with integer arguments. I suspect that cases where the exact properties of arange are required are far between and it's easy enough to simulate the current behaviour if needed. An advantage to that is that the potential pitfalls become obvious when you roll your own version.
In the case of arange it should be possible to determine when the result is potentially ambiguous and issue a warning. For instance, if the argument of the ceil function is close to its rounded value. Chuck

Charles R Harris wrote:
In the case of arange it should be possible to determine when the result is potentially ambiguous and issue a warning. For instance, if the argument of the ceil function is close to its rounded value.
What's "close"? The appropriate tolerance depends on the operations that would cause error. For literal inputs, where the only source of error is representation error, 1 eps would suffice, but then so would linspace(). For results of other computations, you might need more than 1 eps. But if you're doing computations, then it oughtn't to matter whether you get the endpoint or not (since you don't know what the values are anyway). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On 9/14/07, Robert Kern <robert.kern@gmail.com> wrote:
Charles R Harris wrote:
In the case of arange it should be possible to determine when the result is potentially ambiguous and issue a warning. For instance, if the argument of the ceil function is close to its rounded value.
What's "close"? The appropriate tolerance depends on the operations that would cause error. For literal inputs, where the only source of error is representation error, 1 eps would suffice, but then so would linspace(). For results of other computations, you might need more than 1 eps. But if you're doing computations, then it oughtn't to matter whether you get the endpoint or not (since you don't know what the values are anyway).
I would make 'close' very rough, maybe a relative 100*eps. The point would be to warn of *potential* problems and suggest linspace or some other approach, not to warn on only real problems. My guess is that most uses of arange are either well defined or such that a less ambiguous approach should be used. In a way, the warning would be a guess at programmer intent and a gentler solution than making arange integer only. Chuck

Joris De Ridder wrote:
the question is how to reduce user astonishment.
IMHO this is exactly the point. There seems to be two questions here: 1) do we want to reduce user astonishment, and 2) if yes, how could we do this? Not everyone seems to be convinced of the first question, replying that in many cases linspace() could well replace arange(). In many cases, yes, but not all. For some cases arange() has its legitimate use, even for floating point, and in these cases you may get bitten by the inexact number representation. If Matlab seems to be able to avoid surprises, why not numpy?
Here's the thing: binary floating point is intrinsically surprising to people who are only accustomed to decimal. The way to not be surprised is to not use binary floating point. You can hide some of the surprises, but not all of them. When you do try to hide them, all you are doing is creating complicated, ad hoc behavior that is also difficult to predict; for those who have become accustomed to binary floating point's behavior, it's not clear what the "unastonishing" behavior is supposed to be, but binary floating point's is well-defined. Binary floating point is a useful tool for many things. I'm not interested in making numpy something that hides that tool's behavior in order to force it into a use it is not appropriate for. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Robert Kern wrote:
Here's the thing: binary floating point is intrinsically surprising to people who are only accustomed to decimal.
Very good point. Binary arithmetic is NOT less accurate that decimal arithmetic, it just has different values that it can't represent exactly. So one is surprised that 1.0/3.0 isn't represented exactly! The confusion stems from th fact that we use decimal literals, even when using binary arithmetic, but you just need to learn to get used to it. For what it's worth, the MATLAB mailing list has a constant trickle of notes from new users along the lines of "MATLAB is broken!" when they have encountered binary-decimal issues like these. It is inescapable. Binary representation was one of the first things I learned in my first computer class , using Basic, over 25 years ago (am I really that old!). You really need to learn at least a tiny bit about binary if you're going to do math with computers. Oh, and could someone post an actual example of a use for which FP arange is required (with fudges to try to accommodate decimal to binary conversion errors), and linspace won't do? -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

On 15/09/2007, Christopher Barker <Chris.Barker@noaa.gov> wrote:
Oh, and could someone post an actual example of a use for which FP arange is required (with fudges to try to accommodate decimal to binary conversion errors), and linspace won't do?
Well, here's one: evaluating a function we know to be bandlimited to N harmonics and positive trying to bracket a maximum. We know it doesn't change much faster than T/N, so I might use xs = arange(0,T,1/float(4*N)) and then evaluate the function there. Of course, I don't care how many points there are, so no fudges please. But floating-point arange is certainly useful here; to use linspace or integer arange I'd have to write it in a much clumsier way. (Okay, a little clumsier.) In fact, reluctant as I am to provide arguments in favour of godawful floating-point fudges, if I have the harmonics I can use irfft to evaluate my function. I'll then have to carefully calculate the x-values where irfft evaluates, and an off-by-one problem is going to cause my program to fail. I would use integer arange and scale as appropriate, but there's something to be said for using floating-point arange. linspace(...,endpoint=False) is fine, though. Anne

Christopher Barker wrote:
Very good point. Binary arithmetic is NOT less accurate that decimal arithmetic, it just has different values that it can't represent exactly. . . .
Quibble: any number that can be represented exactly in binary can also be represented in decimal, but not vice versa, so binary can indeed be less accurate for some numbers. -- Anton Sherwood, http://www.ogre.nu/ "How'd ya like to climb this high *without* no mountain?" --Porky Pine

On 14 Sep 2007, at 23:51, Robert Kern wrote:
You can hide some of the surprises, but not all of them.
I guess it's impossible to make a bullet-proof "fix". When arange() gets a 'stop' value of 0.60000000000000009, it cannot possibly know whether this stop value is supposed to be 0.6, or whether this value is the result of a genuine computation that has nothing to do with inexact number representation. In the latter case, I would definitely want arange() to be working as it does now. It seems, though, if linspace() is the better equivalent of arange(), its default endpoint=True option seems a little bit inconvenient (but by no means a problem), as you would always have to reset it to emulate arange() behaviour. Joris Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

On 9/14/07, Ed Schofield <edschofield@gmail.com> wrote:
Hi everyone,
This was reported yesterday as a bug in Debian's numpy package:
len(numpy.arange(0, 0.6, 0.1)) == len(numpy.arange(0, 0.4+0.2, 0.1)) False
The cause is this:
ceil((0.4+0.2)/0.1) 7.0
ceil(0.6/0.1) 6.0
which holds for both numpy's and the standard library's ceil().
Since none of the numbers are exactly represented in IEEE floating point, this sort of oddity is expected. If you look at the exact values, (.4 + .2)/.1 > 6 and .6/.1 < 6 . That said, I would expect something like ceil(interval/delta - relatively_really_small_number) would generally return the expected result. Matlab probably plays these sort of games. The downside is encouraging bad programming habits. In this case, the programmer should be using linspace.. Chuck.

On 14/09/2007, Charles R Harris <charlesr.harris@gmail.com> wrote:
Since none of the numbers are exactly represented in IEEE floating point, this sort of oddity is expected. If you look at the exact values, (.4 + .2)/.1 > 6 and .6/.1 < 6 . That said, I would expect something like ceil(interval/delta - relatively_really_small_number) would generally return the expected result. Matlab probably plays these sort of games. The downside is encouraging bad programming habits. In this case, the programmer should be using linspace..
There is actually a context in which floating-point arange makes sense: when you want evenly-spaced points and don't much care how many there are. No reason to play games in this context of course; the question is how to reduce user astonishment. Anne

On 9/14/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
Since none of the numbers are exactly represented in IEEE floating point, this sort of oddity is expected. If you look at the exact values, (.4 + .2)/.1 > 6 and .6/.1 < 6 .
Just for my own benefit (and the past time) here are the actual numbers I get in my PyShell:
0.6 == (0.4+0.2) False `.6` '0.59999999999999998' `.4` '0.40000000000000002' `.2` '0.20000000000000001' `.2+.4` '0.60000000000000009'
To my naive eye this is just "fantastic" ... ;-) -Sebastian Haase PS:you might even notice that "1+2 = 9" ;-)

On Friday 14 September 2007 20:12, Charles R Harris wrote:
Since none of the numbers are exactly represented in IEEE floating point, this sort of oddity is expected. If you look at the exact values, (.4 + .2)/.1 > 6 and .6/.1 < 6 . That said, I would expect
You hit send too fast! The fractions that can be represented exactly in binary are: 1/2, 1/4, 1/8, ... and not 2/10, 4/10, 8/10 .... See here: In [1]:0.5 == .25+.25 Out[1]:True In [2]:.5 Out[2]:0.5 In [3]:.25 Out[3]:0.25 In [4]:.125 Out[4]:0.125 In [8]:.375 == .25 + .125 Out[8]:True Regards, Eike.
participants (12)
-
Anne Archibald
-
Anton Sherwood
-
Charles R Harris
-
Christopher Barker
-
Ed Schofield
-
Eike Welk
-
Joris De Ridder
-
lorenzo bolla
-
Lou Pecora
-
Robert Kern
-
Sebastian Haase
-
Timothy Hochberg