[Python-checkins] CVS: python/dist/src/Doc/tut tut.tex,1.136,1.137

Fri, 08 Jun 2001 09:25:01 -0700

Update of /cvsroot/python/python/dist/src/Doc/tut
In directory usw-pr-cvs1:/tmp/cvs-serv13703/tut

Modified Files:
	tut.tex 
Log Message:

Text from Tim & Guido discussing floating point arithmetic and what users
need to understand about the binary & decimal fp, so that representation
weirdness is documented somewhere.  This makes it easier to repond to "bug"
reports caused by user confusion & ignorance of the issues.

This closes SF patch #426208.

Index: tut.tex
===================================================================
RCS file: /cvsroot/python/python/dist/src/Doc/tut/tut.tex,v
retrieving revision 1.136
retrieving revision 1.137
diff -C2 -r1.136 -r1.137
*** tut.tex	2001/05/22 06:54:14	1.136
--- tut.tex	2001/06/08 16:24:58	1.137
***************
*** 4086,4088 ****
--- 4086,4353 ----

+ \chapter{Floating Point Arithmetic:  Issues and Limitations
+          \label{fp-issues}}
+ 
+ Floating-point numbers are represented in computer hardware as
+ base 2 (binary) fractions.  For example, the decimal fraction
+ 
+ \begin{verbatim}
+ 0.125
+ \end{verbatim}
+ 
+ has value 1/10 + 2/100 + 5/1000, and in the same way the binary fraction
+ 
+ \begin{verbatim}
+ 0.001
+ \end{verbatim}
+ 
+ has value 0/2 + 0/4 + 1/8.  These two fractions have identical values,
+ the only real difference being that the first is written in base 10
+ fractional notation, and the second in base 2.
+ 
+ Unfortunately, most decimal fractions cannot be represented exactly as
+ binary fractions.  A consequence is that, in general, the decimal
+ floating-point numbers you enter are only approximated by the binary
+ floating-point numbers actually stored in the machine.
+ 
+ The problem is easier to understand at first in base 10.  Consider the
+ fraction 1/3.  You can approximate that as a base 10 fraction:
+ 
+ \begin{verbatim}
+ 0.3
+ \end{verbatim}
+ 
+ or, better,
+ 
+ \begin{verbatim}
+ 0.33
+ \end{verbatim}
+ 
+ or, better,
+ 
+ \begin{verbatim}
+ 0.333
+ \end{verbatim}
+ 
+ and so on.  No matter how many digits you're willing to write down, the
+ result will never be exactly 1/3, but will be an increasingly better
+ approximation to 1/3.
+ 
+ In the same way, no matter how many base 2 digits you're willing to
+ use, the decimal value 0.1 cannot be represented exactly as a base 2
+ fraction.  In base 2, 1/10 is the infinitely repeating fraction
+ 
+ \begin{verbatim}
+ 0.0001100110011001100110011001100110011001100110011...
+ \end{verbatim}
+ 
+ Stop at any finite number of bits, and you get an approximation.  This
+ is why you see things like:
+ 
+ \begin{verbatim}
+ >>> 0.1
+ 0.10000000000000001
+ \end{verbatim}
+ 
+ On most machines today, that is what you'll see if you enter 0.1 at
+ a Python prompt.  You may not, though, because the number of bits
+ used by the hardware to store floating-point values can vary across
+ machines, and Python only prints a decimal approximation to the true
+ decimal value of the binary approximation stored by the machine.  On
+ most machines, if Python were to print the true decimal value of
+ the binary approximation stored for 0.1, it would have to display
+ 
+ \begin{verbatim}
+ >>> 0.1
+ 0.1000000000000000055511151231257827021181583404541015625
+ \end{verbatim}
+ 
+ instead!  The Python prompt (implicitly) uses the builtin
+ \function{repr()} function to obtain a string version of everything it
+ displays.  For floats, \code{repr(\var{float})} rounds the true
+ decimal value to 17 significant digits, giving
+ 
+ \begin{verbatim}
+ 0.10000000000000001
+ \end{verbatim}
+ 
+ \code{repr(\var{float})} produces 17 significant digits because it
+ turns out that's enough (on most machines) so that
+ \code{eval(repr(\var{x})) == \var{x}} exactly for all finite floats
+ \var{x}, but rounding to 16 digits is not enough to make that true.
+ 
+ Note that this is in the very nature of binary floating-point: this is
+ not a bug in Python, it is not a bug in your code either, and you'll
+ see the same kind of thing in all languages that support your
+ hardware's floating-point arithmetic.
+ 
+ Python's builtin \function{str()} function produces only 12
+ significant digits, and you may wish to use that instead.  It's
+ unusual for \code{eval(str(\var{x}))} to reproduce \var{x}, but the
+ output may be more pleasant to look at:
+ 
+ \begin{verbatim}
+ >>> print str(0.1)
+ 0.1
+ \end{verbatim}
+ 
+ It's important to realize that this is, in a real sense, an illusion:
+ the value in the machine is not exactly 1/10, you're simply rounding
+ the \emph{display} of the true machine value.
+ 
+ Other surprises follow from this one.  For example, after seeing
+ 
+ \begin{verbatim}
+ >>> 0.1
+ 0.10000000000000001
+ \end{verbatim}
+ 
+ you may be tempted to use the \function{round()} function to chop it
+ back to the single digit you expect.  But that makes no difference:
+ 
+ \begin{verbatim}
+ >>> round(0.1, 1)
+ 0.10000000000000001
+ \end{verbatim}
+ 
+ The problem is that the binary floating-point value stored for "0.1"
+ was already the best possible binary approximation to 1/10, so trying
+ to round it again can't make it better:  it was already as good as it
+ gets.
+ 
+ Another consequence is that since 0.1 is not exactly 1/10, adding 0.1
+ to itself 10 times may not yield exactly 1.0, either:
+ 
+ \begin{verbatim}
+ >>> sum = 0.0
+ >>> for i in range(10):
+ ...     sum += 0.1
+ ...
+ >>> sum
+ 0.99999999999999989
+ \end{verbatim}
+ 
+ Binary floating-point arithmetic holds many surprises like this.  The
+ problem with "0.1" is explained in precise detail below, in the
+ "Representation Error" section.  See
+ \citetitle[http://www.lahey.com/float.htm]{The Perils of Floating
+ Point} for a more complete account of other common surprises.
+ 
+ As that says near the end, ``there are no easy answers.''  Still,
+ don't be unduly wary of floating-point!  The errors in Python float
+ operations are inherited from the floating-point hardware, and on most
+ machines are on the order of no more than 1 part in 2**53 per
+ operation.  That's more than adequate for most tasks, but you do need
+ to keep in mind that it's not decimal arithmetic, and that every float
+ operation can suffer a new rounding error.
+ 
+ While pathological cases do exist, for most casual use of
+ floating-point arithmetic you'll see the result you expect in the end
+ if you simply round the display of your final results to the number of
+ decimal digits you expect.  \function{str()} usually suffices, and for
+ finer control see the discussion of Pythons's \code{\%} format
+ operator: the \code{\%g}, \code{\%f} and \code{\%e} format codes
+ supply flexible and easy ways to round float results for display.
+ 
+ 
+ \section{Representation Error
+          \label{fp-error}}
+ 
+ This section explains the ``0.1'' example in detail, and shows how
+ you can perform an exact analysis of cases like this yourself.  Basic
+ familiarity with binary floating-point representation is assumed.
+ 
+ \dfn{Representation error} refers to that some (most, actually)
+ decimal fractions cannot be represented exactly as binary (base 2)
+ fractions.  This is the chief reason why Python (or Perl, C, \Cpp,
+ Java, Fortran, and many others) often won't display the exact decimal
+ number you expect:
+ 
+ \begin{verbatim}
+ >>> 0.1
+ 0.10000000000000001
+ \end{verbatim}
+ 
+ Why is that?  1/10 is not exactly representable as a binary fraction.
+ Almost all machines today (November 2000) use IEEE-754 floating point
+ arithmetic, and almost all platforms map Python floats to IEEE-754
+ "double precision".  754 doubles contain 53 bits of precision, so on
+ input the computer strives to convert 0.1 to the closest fraction it can
+ of the form \var{J}/2**\var{N} where \var{J} is an integer containing
+ exactly 53 bits.  Rewriting
+ 
+ \begin{verbatim}
+  1 / 10 ~= J / (2**N)
+ \end{verbatim}
+ 
+ as
+ 
+ \begin{verbatim}
+ J ~= 2**N / 10
+ \end{verbatim}
+ 
+ and recalling that \var{J} has exactly 53 bits (is \code{>= 2**52} but
+ \code{< 2**53}), the best value for \var{N} is 56:
+ 
+ \begin{verbatim}
+ >>> 2L**52
+ 4503599627370496L
+ >>> 2L**53
+ 9007199254740992L
+ >>> 2L**56/10
+ 7205759403792793L
+ \end{verbatim}
+ 
+ That is, 56 is the only value for \var{N} that leaves \var{J} with
+ exactly 53 bits.  The best possible value for \var{J} is then that
+ quotient rounded:
+ 
+ \begin{verbatim}
+ >>> q, r = divmod(2L**56, 10)
+ >>> r
+ 6L
+ \end{verbatim}
+ 
+ Since the remainder is more than half of 10, the best approximation is
+ obtained by rounding up:
+ 
+ \begin{verbatim}
+ >>> q+1
+ 7205759403792794L
+ \end{verbatim}
+ 
+ Therefore the best possible approximation to 1/10 in 754 double
+ precision is that over 2**56, or
+ 
+ \begin{verbatim}
+ 7205759403792794 / 72057594037927936
+ \end{verbatim}
+ 
+ Note that since we rounded up, this is actually a little bit larger than
+ 1/10; if we had not rounded up, the quotient would have been a little
+ bit smaller than 1/10.  But in no case can it be *exactly* 1/10!
+ 
+ So the computer never ``sees'' 1/10:  what it sees is the exact
+ fraction given above, the best 754 double approximation it can get:
+ 
+ \begin{verbatim}
+ >>> .1 * 2L**56
+ 7205759403792794.0
+ \end{verbatim}
+ 
+ If we multiply that fraction by 10**30, we can see the (truncated)
+ value of its 30 most significant decimal digits:
+ 
+ \begin{verbatim}
+ >>> 7205759403792794L * 10L**30 / 2L**56
+ 100000000000000005551115123125L
+ \end{verbatim}
+ 
+ meaning that the exact number stored in the computer is approximately
+ equal to the decimal value 0.100000000000000005551115123125.  Rounding
+ that to 17 significant digits gives the 0.10000000000000001 that Python
+ displays (well, will display on any 754-conforming platform that does
+ best-possible input and output conversions in its C library --- yours may
+ not!).
+ 
  \end{document}