[Python-ideas] Python Float Update

Tue Jun 2 05:00:40 CEST 2015

Nicholas, 

Your email client appears to not be quoting text you quote. It is a
conventional to use a leading > for quoting, perhaps you could configure 
your mail program to do so? The good ones even have a "Paste As Quote" 
command.

On with the substance of your post...

On Mon, Jun 01, 2015 at 01:24:32PM -0400, Nicholas Chammas wrote:

> I guess it’s a non-trivial tradeoff. But I would lean towards considering
> people likely to be affected by the performance hit as doing something “not
> common”. Like, if they are doing that many calculations that it matters,
> perhaps it makes sense to ask them to explicitly ask for floats vs.
> decimals, in exchange for giving the majority who wouldn’t notice a
> performance difference a better user experience.

Changing from binary floats to decimal floats by default is a big, 
backwards incompatible change. Even if it's a good idea, we're 
constrained by backwards compatibility: I would imagine we wouldn't want 
to even introduce this feature until the majority of people are using 
Python 3 rather than Python 2, and then we'd probably want to introduce 
it using a "from __future__ import decimal_floats" directive.

So I would guess this couldn't happen until probably 2020 or so.

But we could introduce a decimal literal, say 1.1d for Decimal("1.1"). 
The first prerequisite is that we have a fast Decimal implementation, 
which we now have. Next we would have to decide how the decimal literals 
would interact with the decimal module. Do we include full support of 
the entire range of decimal features, including globally configurable 
precision and other modes? Or just a subset? How will these decimals 
interact with other numeric types, like float and Fraction? At the 
moment, Decimal isn't even part of the numeric tower.

There's a lot of ground to cover, it's not a trivial change, and will 
definitely need a PEP.

> How many of your examples are inherent limitations of decimals vs. problems
> that can be improved upon?

In one sense, they are inherent limitations of floating point numbers 
regardless of base. Whether binary, decimal, hexadecimal as used in some 
IBM computers, or something else, you're going to see the same problems. 
Only the specific details will vary, e.g. 1/3 cannot be represented 
exactly in base 2 or base 10, but if you constructed a base 3 float, it 
would be exact.

In another sense, Decimal has a big advantage that it is much more 
configurable than Python's floats. Decimal lets you configure the 
precision, rounding mode, error handling and more. That's not inherent 
to base 10 calculations, you can do exactly the same thing for binary 
floats too, but Python doesn't offer that feature for floats, only for 
Decimals.

But no matter how you configure Decimal, all you can do is shift the 
gotchas around. The issue really is inherent to the nature of the 
problem, and you cannot defeat the universe. Regardless of what 
base you use, binary or decimal or something else, or how many digits 
precision, you're still trying to simulate an uncountably infinite 
continuous, infinitely divisible number line using a finite, 
discontinuous set of possible values. Something has to give.

(For the record, when I say "uncountably infinite", I don't just mean 
"too many to count", it's a technical term. To oversimplify horribly, it 
means "larger than infinity" in some sense. It's off-topic for here, 
but if anyone is interested in learning more, you can email me off-list, 
or google for "countable vs uncountable infinity".)

Basically, you're trying to squeeze an infinite number of real numbers 
into a finite amount of memory. It can't be done. Consequently, there 
will *always* be some calculations where the true value simply cannot be 
calculated and the answer you get is slightly too big or slightly too 
small. All the other floating point gotchas follow from that simple 
fact.

> Admittedly, the only place where I’ve played with decimals extensively is
> on Microsoft’s SQL Server (where they are the default literal
> <https://msdn.microsoft.com/en-us/library/ms179899.aspx>). I’ve stumbled in
> the past on my own decimal gotchas
> <http://dba.stackexchange.com/q/18997/2660>, but looking at your examples
> and trying them on SQL Server I suspect that most of the problems you show
> are problems of precision and scale.

No. Change the precision and scale, and some *specific* problems goes 
away, but they reappear with other numbers.

Besides, at the point that you're talking about setting the precision, 
we're really not talking about making things easy for beginners any 
more.

And not all floating point issues are related to precision and scale in 
decimal. You cannot divide a cake into exactly three equal pieces in 
Decimal any more than you can divide a cake into exactly three equal 
pieces in binary. All you can hope for is to choose a precision were the 
rounding errors in one part of your calculation will be cancelled by the 
rounding errors in another part of your calculation. And that precision 
will be different for any two arbitrary calculations.

-- 
Steve