floating point arithmetic issue
Hello, I ran into a difficulty with floating point arithmetic in python. Namely that:
0.001 + 1 - 1 0.00099999999999988987
And, as a consequence, in python:
0.001 + 1 - 1 == 0.001 False
In more details, my problem is that I have a fonction which needs to compute (a + b - c) % a. And for b == c, you would expect the result to be 0 whatever the value of a. But it isn't...
(0.001 + 1 - 1) % 0.001 0.00099999999999988987
Is there any way to solve this? Many thanks, Guillaume
Hi Guillaume, On 07/30/2010 07:45 PM, Guillaume Chérel wrote:
Hello,
I ran into a difficulty with floating point arithmetic in python.
This is not python specific, but true in any language which uses the floating point unit of your hardware, assuming you have "conventional" hardware.
Namely that:
0.001 + 1 - 1 0.00099999999999988987
And, as a consequence, in python:
0.001 + 1 - 1 == 0.001 False
In more details, my problem is that I have a fonction which needs to compute (a + b - c) % a. And for b == c, you would expect the result to be 0 whatever the value of a. But it isn't...
Indeed, it is not, and that's expected. There are various pitfalls using floating point. Rational and explanations: http://docs.sun.com/source/806-3568/ncg_goldberg.html
Is there any way to solve this?
Using % with float is generally a wrong idea IMO. You have not said what you are trying to do. The solution may be to use integers, or to do it differently, but we need more info, cheers, David
Fri, 30 Jul 2010 19:52:37 +0900, David wrote: [clip]
Indeed, it is not, and that's expected. There are various pitfalls using floating point. Rational and explanations:
In case of tl;dr, see also http://floating-point-gui.de/ -- Pauli Virtanen
Hi, Thanks for all your answers and the references (and yes, I have to admit that I've been a bit lazy with Goldberg's article, though it looks very thorough). But as numpy is designed for scientific computing, is there no implementation of an "exact type" (http://floating-point-gui.de/formats/exact/) to avoid floating point issues? As for the details about my problem, I'm trying to compute the total surface of overlapping disks. I approximate the surface with a grid and count how many points of the grid fall into at least one disk. I use the mod operator in a calculation to find the coordinates of the left-most point of the grid that falls into a given disk, and I do that for all disks. I need to use floating point numbers with this modulo operator because I want the resolution of the grid is arbitrary. I suppose my explanation might not be very straightforward, so if you know of another good way to compute the surface of overlapping disks, I'd be glad to know. Thanks, Guillaume Le 30/07/2010 13:21, Pauli Virtanen a écrit :
Fri, 30 Jul 2010 19:52:37 +0900, David wrote: [clip]
Indeed, it is not, and that's expected. There are various pitfalls using floating point. Rational and explanations:
http://docs.sun.com/source/806-3568/ncg_goldberg.html In case of tl;dr, see also http://floating-point-gui.de/
On 7/30/2010 8:21 AM, Guillaume Chérel wrote:
is there no implementation of an "exact type"
You could use the fraction module:
f = [Fraction(i,10) for i in range(10)] a = np.array(f, dtype=object) a array([0, 1/10, 1/5, 3/10, 2/5, 1/2, 3/5, 7/10, 4/5, 9/10], dtype=object)
But I don't see why you would. Your computation will be approximate in any case. Alan Isaac
2010/7/30 Guillaume Chérel <guillaume.c.cherel@gmail.com>:
Hi,
Thanks for all your answers and the references (and yes, I have to admit that I've been a bit lazy with Goldberg's article, though it looks very thorough).
But as numpy is designed for scientific computing, is there no implementation of an "exact type" (http://floating-point-gui.de/formats/exact/) to avoid floating point issues?
I think there is a misunderstanding: the *vast* majority of scientific computing use floating point. The most commonly used way of doing "exact" computation is to do it symbolically, which is inapproriate in most practical cases. Arbitrary precision is also very different from exact precision - arbitrary precision means you are trading speed/memory for more precision, but there will still be errors. I don't really understand the exact problem you are trying to solve, but since you are approximating something, I am doubtful you need exact precision, and in that case, floating point as used in numpy (and every other software doing *numerical* computation) is likely to be enough. cheers, David
Guillaume Chérel wrote:
As for the details about my problem, I'm trying to compute the total surface of overlapping disks. I approximate the surface with a grid and count how many points of the grid fall into at least one disk.
That is a highly approximate way to do it - which may be fine, but I doubt any floating point errors you get are going to make it worse.
if you know of another good way to compute the surface of overlapping disks, I'd be glad to know.
Are these "disks" exactly round? If so -- use coordinate geometry to calculate it exactly (or as exactly as floating point allows ;-) ) I just googled: "area of intersecting circles" And got a bunch of hits. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Fri, Jul 30, 2010 at 6:15 PM, Christopher Barker <Chris.Barker@noaa.gov> wrote:
Guillaume Chérel wrote:
As for the details about my problem, I'm trying to compute the total surface of overlapping disks. I approximate the surface with a grid and count how many points of the grid fall into at least one disk.
That is a highly approximate way to do it - which may be fine, but I doubt any floating point errors you get are going to make it worse.
It is a problem in my algorithm because it modifies the number of points that are considered inside a given disk (namely, the points that are at the very boundary of the disk), and results in something like an "index out of bounds" error later.
> if you
know of another good way to compute the surface of overlapping disks, I'd be glad to know.
Are these "disks" exactly round? If so -- use coordinate geometry to calculate it exactly (or as exactly as floating point allows ;-) )
I just googled: "area of intersecting circles"
And got a bunch of hits.
Originally, I was looking to solve my problem quickly, and a rough approximation was (and is) enough. There are indeed solutions for computing the area of 2 intersecting disks, but my problem may involve many more than 2 disks at a time, which makes things a lot more complicated (to me, at least).
2010/7/30 Guillaume Chérel <guillaume.c.cherel@gmail.com>
On Fri, Jul 30, 2010 at 6:15 PM, Christopher Barker <Chris.Barker@noaa.gov> wrote:
Guillaume Chérel wrote:
As for the details about my problem, I'm trying to compute the total surface of overlapping disks. I approximate the surface with a grid and count how many points of the grid fall into at least one disk.
That is a highly approximate way to do it - which may be fine, but I doubt any floating point errors you get are going to make it worse.
It is a problem in my algorithm because it modifies the number of points that are considered inside a given disk (namely, the points that are at the very boundary of the disk), and results in something like an "index out of bounds" error later.
if you know of another good way to compute the surface of overlapping disks, I'd be glad to know.
Are these "disks" exactly round? If so -- use coordinate geometry to calculate it exactly (or as exactly as floating point allows ;-) )
I just googled: "area of intersecting circles"
And got a bunch of hits.
Originally, I was looking to solve my problem quickly, and a rough approximation was (and is) enough. There are indeed solutions for computing the area of 2 intersecting disks, but my problem may involve many more than 2 disks at a time, which makes things a lot more complicated (to me, at least).
Supposing that you know the centers and radius of each disk, one could brute-force a calculation of the distance of each point you are sampling to the center of each disk and see if the distances are within the radius of the particular disk. To make it a little more efficient, you can use SciPy's kdtrees and build two trees, one for the points you wish to sample, and another containing the centers of each disk and use that to find the closest points to each center. Ben Root
Fri, 30 Jul 2010 14:21:23 +0200, Guillaume Chérel wrote: [clip]
As for the details about my problem, I'm trying to compute the total surface of overlapping disks. I approximate the surface with a grid and count how many points of the grid fall into at least one disk.
HTH, import numpy as np # some random circles x = np.random.randn(200) y = np.random.randn(200) radius = np.random.rand(200)*0.1 # grid, [-1, 1] x [-1, 1], 200 x 200 points xmin, xmax = -1, 1 ymin, ymax = -1, 1 grid_x, grid_y = np.mgrid[xmin:xmax:200j, ymin:ymax:200j] # count mask = np.zeros((200, 200), bool) for xx, yy, rr in zip(x, y, radius): mask |= (grid_x - xx)**2 + (grid_y - yy)**2 < rr**2 area = mask.sum() * (xmax-xmin) * (ymax-ymin) / float(mask.size)
Hi Pauli, Your solution is really good. It's almost exactly what I was doing, but shorter and I didn't know about the mgrid thing. There is an extra optimization step I perform in my code to avoid the computation of the many points of the grid (assuming the circles are relatively small) which you know only by their x or y coordinate they're out of the circle without computing the distance. And that's where I need the modulo operator to compute the left-most and top-most points of the grid that are inside the circle. I'll try with your solution, maybe my optimization isn't that necessary with the mgrid trick. On Fri, Jul 30, 2010 at 7:15 PM, Pauli Virtanen <pav@iki.fi> wrote:
Fri, 30 Jul 2010 14:21:23 +0200, Guillaume Chérel wrote: [clip]
As for the details about my problem, I'm trying to compute the total surface of overlapping disks. I approximate the surface with a grid and count how many points of the grid fall into at least one disk.
HTH,
import numpy as np
# some random circles x = np.random.randn(200) y = np.random.randn(200) radius = np.random.rand(200)*0.1
# grid, [-1, 1] x [-1, 1], 200 x 200 points xmin, xmax = -1, 1 ymin, ymax = -1, 1 grid_x, grid_y = np.mgrid[xmin:xmax:200j, ymin:ymax:200j]
# count mask = np.zeros((200, 200), bool) for xx, yy, rr in zip(x, y, radius): mask |= (grid_x - xx)**2 + (grid_y - yy)**2 < rr**2
area = mask.sum() * (xmax-xmin) * (ymax-ymin) / float(mask.size)
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Fri, 30 Jul 2010 19:48:45 +0200, Guillaume Chérel wrote:
Your solution is really good. It's almost exactly what I was doing, but shorter and I didn't know about the mgrid thing.
It's a very brute-force solution and probably won't be very fast with many circles or large grids.
There is an extra optimization step I perform in my code to avoid the computation of the many points of the grid (assuming the circles are relatively small) which you know only by their x or y coordinate they're out of the circle without computing the distance. And that's where I need the modulo operator to compute the left-most and top-most points of the grid that are inside the circle.
If your circles are quite small, you probably want to clip the "painting" to a box not much larger than a single circle: # untested, something like below def point_to_index(x, y, pad=0): return np.clip(200 * rr / (xmax - xmin) + pad, 0, 200), \ np.clip(200 * rr / (ymax - ymin) + pad, 0, 200) i0, j0 = point_to_index(xx - rr, yy - rr, pad=-2) i1, j1 = point_to_index(xx + rr, yy + rr, pad=2) box = np.index_exp[i0:j0,i1:j1] mask[box] |= (grid_x[box] - xx)**2 + (grid_y[box] - yy)**2 < rr**2 # same as: mask[i0:j0,i1:j1] |= (grid_x[i0:j0,i1:j1] ... -- Pauli Virtanen
Hi Pauli,
If your circles are quite small, you probably want to clip the "painting" to a box not much larger than a single circle:
# untested, something like below def point_to_index(x, y, pad=0): return np.clip(200 * rr / (xmax - xmin) + pad, 0, 200), \ np.clip(200 * rr / (ymax - ymin) + pad, 0, 200)
I'm not sure I understand what you're doing with the clip function here. Could you say what are the two returned arrays supposed to represent? Thanks guillaume
i0, j0 = point_to_index(xx - rr, yy - rr, pad=-2) i1, j1 = point_to_index(xx + rr, yy + rr, pad=2)
box = np.index_exp[i0:j0,i1:j1] mask[box] |= (grid_x[box] - xx)**2 + (grid_y[box] - yy)**2 < rr**2 # same as: mask[i0:j0,i1:j1] |= (grid_x[i0:j0,i1:j1] ...
-- Pauli Virtanen
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
2010/7/30 Guillaume Chérel <guillaume.c.cherel@gmail.com>:
Hello,
I ran into a difficulty with floating point arithmetic in python. Namely that:
>>> 0.001 + 1 - 1 0.00099999999999988987
And, as a consequence, in python:
>>> 0.001 + 1 - 1 == 0.001 False
In more details, my problem is that I have a fonction which needs to compute (a + b - c) % a. And for b == c, you would expect the result to be 0 whatever the value of a. But it isn't...
>>> (0.001 + 1 - 1) % 0.001 0.00099999999999988987
Is there any way to solve this?
One simple way is to put brackets round the integer subtraction,
0.001 + 1 - 1 == 0.001 False 0.001 + (1 - 1) == 0.001 True
However, in general comparing floating point numbers is tricky and a complex issue in numerical computing. This is not a Python or NumPy specific problem ;) Peter
2010/7/30 Guillaume Chérel <guillaume.c.cherel@gmail.com>:
Hello,
I ran into a difficulty with floating point arithmetic in python. Namely that:
>>> 0.001 + 1 - 1 0.00099999999999988987
And, as a consequence, in python:
>>> 0.001 + 1 - 1 == 0.001 False
In more details, my problem is that I have a fonction which needs to compute (a + b - c) % a. And for b == c, you would expect the result to be 0 whatever the value of a. But it isn't...
>>> (0.001 + 1 - 1) % 0.001 0.00099999999999988987
Is there any way to solve this?
You can do slightly better than normal floating point arithmetic using some special algorithms that compensate in one way or another for the error. See math.fsum: http://docs.python.org/library/math.html#math.fsum
sum([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1]) 0.9999999999999999 fsum([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1]) 1.0 fsum([0.001, +1, -1]) 0.001
It won't be as fast, and naturally it won't always work, but it's something.
participants (9)
-
Alan G Isaac
-
Benjamin Root
-
Christopher Barker
-
David
-
David Cournapeau
-
Guillaume Chérel
-
Ken Watford
-
Pauli Virtanen
-
Peter