[Python-ideas] Python Numbers as Human Concept Decimal System

Sun Mar 9 22:33:41 CET 2014

[Guido]
> ...
> I did do some more thinking about how the magic repr() affects the
> distribution of values, and came up with an example of sorts that might show
> what it does. We've mostly focused on simple like 1.1, but to understand the
> distribution issue it's better to look at a very large value.
>
> I took 2**49 as an example and added a random fraction. When printed this
> always gives a single digit past the decimal point, e.g. 562949953421312.5.
> Then I measured the distribution of the last digit. What I found matched my
> prediction: the digits 0, 1, 2, 4, 5, 6, 8, 9 occurred with roughly equal
> probability (1/8th). So 3 and 7 are completely missing.
>
> The explanation is simple enough: using the (current) Decimal class it's
> easy to see that there are only 8 possible actual values, whose fractional
> part is a multiple of 1/8. IOW the exact values end in .000, .125, .250,
> .375, .500, .625, .750, .875. (*) The conclusion is that there are only 3
> bits represented after the binary point, and repr() produces a single digit
> here, because that's all that's needed to correctly round back to the 8
> possible values. So it picks the digit closest to each of the possible
> values, and when there are two possibilities it picks one. I don't know how
> it picks, but it is reproducible -- in this example it always chooses .2 to
> represent .250, and .8 to represent .750.

Just the usual round-to-nearest/even.  0.25 and 0.75 are exactly
halfway, so round to the closest even retained digit (0.2 and 0.8).

> ...
> (*) I was momentarily startled to find that the set of Decimals produced
> contained 9 items, until I realized that some random() call must have
> produced a value close enough to 1 to be rounded up.

Not rare!  About half the trailing zeroes are accounted for by
rounding up fractions >= 0.9375 (1-1/16), and the other half from
rounding down fractions <= 0.0625 (1/16).  For example, here's the
full distribution across 10,000 tries:

562949953421312.0   631
562949953421312.1  1230
562949953421312.2  1224
562949953421312.4  1270
562949953421312.5  1245
562949953421312.6  1287
562949953421312.8  1271
562949953421312.9  1206
562949953421313.0   636

Except that .3 and .7 are missing (and 2 values must be missing
regardless of rounding rules), there's no bias toward higher or lower
numbers.  There is a bias toward trailing even digits, but that's an
intended consequence of the "even" in "round-to-nearest/even".

BTW, I still like changing Decimal(a_float) to work with
repr(a_float).  It's more-than-less unprincipled, but it would reduce
surprises for non-experts.  Ya, I know they may eventually bump into a
non-monotonic case (etc), but by the time they can _spell_ "monotonic"
they won't be newbies anymore ;-)

For a while I hung out on stackoverflow answering "Python fp is
broken!" complaints.  One thing I learned is that while the
questioners always had a shallow (to be charitable) understanding of
fp issues, exceedingly few had the slightest interest in gaining
deeper knowledge.  They just wanted their immediate problem to go
away, happy to accept any quick hack.  Will they get burned again
later?  Sure.  But you can't teach the unteachable.  What you can do
is delay the day they realize they're in trouble again ;-)