[CC back to the list because you posted the same argument there but without the numerical example, and my working through that might help others understand your point]

On Fri, Mar 7, 2014 at 9:18 PM, Andrew Barnert <abarnert@yahoo.com> wrote:

The main point I'm getting at is that by rounding 0.100000000000000012 to 0.1 instead of 0.10000000000000000555..., You're no longer rounding it to the nearest binary float, but instead to the second nearest Decimal(repr(binary float)) (since 0.10000000000000002 is closer than 0.1).

OK, let me walk through that carefully. Let's name the exact mathematical values and assign them to strings:

>>> a = '0.100000000000000012'
>>> b = '0.1000000000000000055511151231257827021181583404541015625'
>>> c = '0.10000000000000002'

Today, Decimal(float(a)) == Decimal(b). Under my proposal, Decimal(float(a)) == Decimal('0.1'). The difference between float('0.1') and float(c) is 1 ulp (2**-56), and a is between those, but closer to c; but it is even closer to b (in the other direction). IOW for the mathematical values, 0.1 < b < a < c, where a is closer to b than to c, So if the choices for rounding a would be b or c, b is preferred. So far so good. (And still good if we replace c with the slightly smaller exact value of float(c).)

And your point is that if we change the allowable choices to '0.1' or c, we find that float(b) == float('0.1'), but a is closer to c than to 0.1. This is less than 1 ulp, but more than 0.5 ulp.

I find the argument intriguing, but I blame it more on what happens in float(a) than in what Decimal() does to the resulting value. If you actually had the string a, and wanted to convert it to Decimal, you would obviously write Decimal(a), not Decimal(float(a)), so this is really only a problem when someone uses a as a literal in a program that is passed to Decimal, i.e. Decimal(0.100000000000000012).

That's slightly unfortunate, but easy to fix by adding quotes. The only place where I think something like this might occur in real life is when someone copies a numerical recipe involving some very precise constants, and mindlessly applies Decimal() without string quotes to the constants. But that's a "recipe" for failure anyway, since if the recipe really uses more precision than IEEE double can handle, *with* the quotes the recipe would be calculated more exactly anyway. Perhaps another scenario would be if the constant was calculated (by the recipe-maker) within 0.5 ulp using IEEE double and rendered with exactly the right number of digits.

But these scenarios sound like either they should use the quotes anyway, or the calculation would be better off done in double rather than Decimal. So I think it's still pretty much a phantom problem.

Of course that's not true for all reals (0.1 being the obvious counterexample), but it's true for some with your proposal, while today it's true for none. So the mean absolute error in Decimal(repr(f)) across any range of reals is inherently higher than Decimal.from_float(f). Put another way, you're adding additional rounding error. That additional rounding error is still less than the rule-of-thumb cutoff that people use when talking about going through float, but it's nonzero and not guaranteed to cancel out.

On top of that, the distribution of binary floats is uniform (well, more complicated than uniform because they have an exponent as well as a mantissa, but you know what I mean); the distribution of closest-repr values to binary floats is not.

I have no idea whether either of these are properties that users of Decimal (or, rather, Decimal and float together) care about. But they are properties that Decimal(float) has today that would be lost.

--
--Guido van Rossum (python.org/~guido)