Re: [Python-ideas] Python Numbers as Human Concept Decimal System
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
[CC back to the list because you posted the same argument there but without the numerical example, and my working through that might help others understand your point] On Fri, Mar 7, 2014 at 9:18 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
The main point I'm getting at is that by rounding 0.100000000000000012 to 0.1 instead of 0.10000000000000000555..., You're no longer rounding it to the nearest binary float, but instead to the second nearest Decimal(repr(binary float)) (since 0.10000000000000002 is closer than 0.1).
OK, let me walk through that carefully. Let's name the exact mathematical values and assign them to strings:
a = '0.100000000000000012' b = '0.1000000000000000055511151231257827021181583404541015625' c = '0.10000000000000002'
Today, Decimal(float(a)) == Decimal(b). Under my proposal, Decimal(float(a)) == Decimal('0.1'). The difference between float('0.1') and float(c) is 1 ulp (2**-56), and a is between those, but closer to c; but it is even closer to b (in the other direction). IOW for the mathematical values, 0.1 < b < a < c, where a is closer to b than to c, So if the choices for rounding a would be b or c, b is preferred. So far so good. (And still good if we replace c with the slightly smaller exact value of float(c).) And your point is that if we change the allowable choices to '0.1' or c, we find that float(b) == float('0.1'), but a is closer to c than to 0.1. This is less than 1 ulp, but more than 0.5 ulp. I find the argument intriguing, but I blame it more on what happens in float(a) than in what Decimal() does to the resulting value. If you actually had the string a, and wanted to convert it to Decimal, you would obviously write Decimal(a), not Decimal(float(a)), so this is really only a problem when someone uses a as a literal in a program that is passed to Decimal, i.e. Decimal(0.100000000000000012). That's slightly unfortunate, but easy to fix by adding quotes. The only place where I think something like this might occur in real life is when someone copies a numerical recipe involving some very precise constants, and mindlessly applies Decimal() without string quotes to the constants. But that's a "recipe" for failure anyway, since if the recipe really uses more precision than IEEE double can handle, *with* the quotes the recipe would be calculated more exactly anyway. Perhaps another scenario would be if the constant was calculated (by the recipe-maker) within 0.5 ulp using IEEE double and rendered with exactly the right number of digits. But these scenarios sound like either they should use the quotes anyway, or the calculation would be better off done in double rather than Decimal. So I think it's still pretty much a phantom problem.
Of course that's not true for all reals (0.1 being the obvious counterexample), but it's true for some with your proposal, while today it's true for none. So the mean absolute error in Decimal(repr(f)) across any range of reals is inherently higher than Decimal.from_float(f). Put another way, you're adding additional rounding error. That additional rounding error is still less than the rule-of-thumb cutoff that people use when talking about going through float, but it's nonzero and not guaranteed to cancel out.
On top of that, the distribution of binary floats is uniform (well, more complicated than uniform because they have an exponent as well as a mantissa, but you know what I mean); the distribution of closest-repr values to binary floats is not.
I have no idea whether either of these are properties that users of Decimal (or, rather, Decimal and float together) care about. But they are properties that Decimal(float) has today that would be lost.
-- --Guido van Rossum (python.org/~guido)
![](https://secure.gravatar.com/avatar/7e41acaa8f6a0e0f5a7c645e93add55a.jpg?s=120&d=mm&r=g)
From: Guido van Rossum <guido@python.org> Sent: Friday, March 7, 2014 10:42 PM
[CC back to the list because you posted the same argument there but without the numerical example, and my working through that might help others understand your point]
Thank you for presenting my point better than I could; it's a lot clearer this way.
On Fri, Mar 7, 2014 at 9:18 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
The main point I'm getting at is that by rounding 0.100000000000000012 to 0.1 instead of 0.10000000000000000555..., You're no longer rounding it to the nearest binary float, but instead to the second nearest Decimal(repr(binary float)) (since 0.10000000000000002 is closer than 0.1).
OK, let me walk through that carefully. Let's name the exact mathematical values and assign them to strings:
a = '0.100000000000000012' b = '0.1000000000000000055511151231257827021181583404541015625' c = '0.10000000000000002'
Today, Decimal(float(a)) == Decimal(b). Under my proposal, Decimal(float(a)) == Decimal('0.1'). The difference between float('0.1') and float(c) is 1 ulp (2**-56), and a is between those, but closer to c; but it is even closer to b (in the other direction). IOW for the mathematical values, 0.1 < b < a < c, where a is closer to b than to c, So if the choices for rounding a would be b or c, b is preferred. So far so good. (And still good if we replace c with the slightly smaller exact value of float(c).)
And your point is that if we change the allowable choices to '0.1' or c, we find that float(b) == float('0.1'), but a is closer to c than to 0.1. This is less than 1 ulp, but more than 0.5 ulp.
Yes. It's the same two problems that inspired Clinger's correct rounding papers[1][2]: it does not have the closest-match property, and it can lose almost twice as much accuracy. But the context is very different, so I'm not sure Clinger's arguments are relevant here. [1]: http://citeseer.ist.psu.edu/william90how.html [2]: ftp://ftp.ccs.neu.edu/pub/people/will/retrospective.pdf
I find the argument intriguing, but I blame it more on what happens in float(a) than in what Decimal() does to the resulting value. If you actually had the string a, and wanted to convert it to Decimal, you would obviously write Decimal(a), not Decimal(float(a)), so this is really only a problem when someone uses a as a literal in a program that is passed to Decimal, i.e. Decimal(0.100000000000000012).
Agreed on both counts. However, the entire problem you're trying to solve here is caused by what happens in float(a). You're effectively attempting to recover the information lost in float() in Decimal(), in a way that often does what people want, and otherwise never does anything too bad. So long as giving up the correct-rounding property, doubling the error (but still staying under 1 ulp), and skewing the distribution of Decimals created this way (by less than 0.5 ulp, but possibly in a way that can accumulate) are not "too bad", I believe your proposal succeeds completely.
That's slightly unfortunate, but easy to fix by adding quotes.
Yes, but the motivating example to this whole thread, Decimal(1.1), is just as easy to fix by using quotes. I think I can see the distinction: Novices don't know to use quotes; people trying to implement numerical recipes in Python do (or at least really, really should); therefore a change that helps the former but hurts the latter, when they both leave off the quotes, is a net gain. Yes?
participants (2)
-
Andrew Barnert
-
Guido van Rossum