[Tutor] int(1.99...99) = 1 and can = 2

Sun May 1 06:43:38 EDT 2016

On Sun, May 01, 2016 at 01:02:50AM -0500, boB Stepp wrote:
> Life has kept me from Python studies since March, but now I resume.
> Playing around in the interpreter I tried:
> 
> py3: 1.9999999999999999
> 2.0
> py3: 1.999999999999999
> 1.999999999999999

Correct. Python floats carry 64 bits of value, which means in 
practice that they can carry about 16-17 significant figures in 
decimal. 

Starting with Python 2.6, floats have "hex" and "fromhex" methods which 
allow you to convert them to and from base 16, which is more compact 
than the base 2 used internally but otherwise equivalent.

https://docs.python.org/2/library/stdtypes.html#float.hex

So here is your second example, shown in hex so we can get a better 
idea of the internal details:

py> (1.999999999999999).hex()
'0x1.ffffffffffffbp+0'

The "p+0" at the end shows the exponent, as a power of 2. (It can't use 
"e" or "E" like decimal, because that would be confused with the hex 
digit "e".)

You can see that the last hex digit is "b". If we add an extra digit to 
the end of the decimal 1.999999999999999, that final digit increases 
until we reach:

py> (1.9999999999999997).hex()
'0x1.fffffffffffffp+0'

1.9999999999999998 also gives us the same result. More on this later.

If we increase the final decimal digit one more, we get:

py> (1.9999999999999999).hex()
'0x1.0000000000000p+1'

which is equal to decimal 2: a mantissa of 1 in hex, an exponent of 1 
in decimal, which gives 1*2**1 = 2.

Given that we only have 64 bits for a float, and some of them are used 
for the exponent and the sign, it is invariable that conversions to and 
from decimal must be inexact. Remember that I mentioned that both 
1.9999999999999997 and 1.9999999999999998 are treated as the same float? 
That is because a 64-bit binary float does not have enough binary 
decimal places to distinguish them. You would need more than 64 bits to 
tell them apart. And so, following the IEEE-754 standard (the best 
practice for floating point arithmetic), both numbers are rounded to the 
nearest possible float.

Why the nearest possible float? Because any other choice, such as 
"always round down", or "always round up", or "round up on Tuesdays", 
will have *larger* rounding errors. Rounding errors are inescapable, but 
we can do what we can to keep them as small as possible. So, decimal 
strings like 1.999...97 generate the binary float with the smallest 
possible error.

(In fact, the IEEE-754 standard requires that the rounding mode be 
user-configurable. Unfortunately, most C maths library do not provide 
that functionality, or if they do, it is not reliable.)

A diagram might help make this more clear. This ASCII art is best viewed 
using a fixed-width font like Courier.

Suppose we look at every single float between 1 and 2. Since they use
a finite number of bits, there are a finite number of equally spaced 
floats between any two consecutive whole numbers. But because they are 
in binary, not decimal, they won't match up with decimal floats except 
for numbers like 0.5, 0.25 etc. So:

1 _____ | _____ | _____ | _____ | ... | _____ | _____ | _____ 2
---------------------------------------------------^----^---^
                                                   a    b   c

The first arrow ^ marked as "a" represents the true position of 
1.999...97 and the second, "b", represents the true position of 
1.999...98. Since they don't line up exactly with the binary float 
0x1.ffff....ff, there is some rounding error, but it is the smallest 
error possible.

The third arrow, marked as "c", represents 1.999...99.

> py3: int(1.9999999999999999)
> 2
> py3: int(1.999999999999999)
> 1

The int() function always truncates. So in the first place, your float 
starts off as 2.0 (as seen above), and then int() truncates it to 2.0. 
The second case starts off as with a float 1.9999... which is

'0x1.ffffffffffffbp+0'

which int() then truncates to 1.

> It has been many years since I did problems in converting decimal to
> binary representation (Shades of two's-complement!), but I am under
> the (apparently mistaken!) impression that in these 0.999...999
> situations that the floating point representation should not go "up"
> in value to the next integer representation.

In ancient days, by which I mean the earlier than the 1980s, there was 
no agreement on how floats should be rounded by computer manufacturers. 
Consequently they all used their own rules, which contradicted the rules 
used by other manufacturers, and sometimes even their own. But in the 
early 80s, a consortium of companies including Apple, Intel and others 
got together and agreed on best practices (give or take a few 
compromises) for computer floating point maths. One of those is that the 
default rounding mode should be round to nearest, so as to minimize the 
errors. Otherwise, if you always round down, then errors accumulate 
faster.

We can test this with the fractions and decimal modules:

py> from fractions import Fraction
py> f = Fraction(0)
py> for i in range(1, 100):
...     f += Fraction(1)/i
...
py> f
Fraction(360968703235711654233892612988250163157207, 
69720375229712477164533808935312303556800)
py> float(f)
5.17737751763962

So that tells us the exact result of adding the recipricals of 1 through 
99, and the nearest binary float. Now let's do it again, only this time 
with only limited precision:

py> from decimal import *
py> d = Decimal(0)
py> with localcontext() as ctx:
...     ctx.prec = 5
...     for i in range(1, 100):
...             d += Decimal(1)/i
...
py> d
Decimal('5.1773')

That's not too bad: four out of the five significant figures are 
correct, the the fifth is only off by one. (It should be 5.1774 if we 
added exactly, and rounded only at the end.) But if we change to always 
round down:

py> d = Decimal(0)
py> with localcontext() as ctx:
...     ctx.prec = 5
...     ctx.rounding = ROUND_DOWN
...     for i in range(1, 100):
...             d += Decimal(1)/i
...
py> d
Decimal('5.1734')

we're now way off: only three significant figures are correct, and the 
fourth is off by 4.

Obviously this is an extreme case, for demonstration purposes only. But 
the principle is the same for floats: the IEEE 754 promise to keep 
simple arithmetic is correctly rounded ensures that errors are as small 
as possible.

-- 
Steve