[Python-Dev] Backport new float repr to Python 2.7?

Sun Oct 11 20:28:11 CEST 2009

In a recent #python-dev IRC conversation, it was suggested that we
should consider backporting the new-style float repr from py3k to
trunk.  I'd like to get people's opinions on this idea.

To recap quickly, the algorithm for computing the repr of floats changed
between Python 2.x and Python 3.x (well, actually between 3.0 and 3.1,
but 3.0 is dead):

 - in Python 2.x, repr(x) computes 17 significant decimal digits, and
   then strips trailing zeros.  In other words, it's pretty much identical
   to doing '%.17g' % x.  The computation is done using the platform's
   *printf functions.

 - in Python 3.x, repr(x) returns the shortest decimal string that's
   guaranteed to evaluate back to the float x under correct rounding.
   The computation is done using David Gay's dtoa.c code, adapted
   for inclusion in Python (in file Python/dtoa.c).

There are (in my view) many benefits to the new approach.  Among
them:

 - fewer newbie complaints and questions (on c.l.p, IRC, Stack
   Overflow, etc.) about Python 'rounding incorrectly'. Whether this is a
   good thing or not is the matter of some debate (I'm tempted to
   borrow the time machine and simply say 'see the replies
   to this message'!)

 - string to float *and* float to string conversions are both guaranteed
   correctly rounded in 3.x: David Gay's code implements the conversion
   in both directions, and having correctly rounded string -> float
   conversions is essential to ensure that eval(repr(x)) recovers x exactly.

 - the repr of round(x, n) really does have at most n digits after the
   point, giving the semi-illusion that x really has been rounded exactly,
   and eliminating one of the most common user complaints about the
   round function.

 - round(x, n) agrees exactly with '{:.{}f}'.format(x, n)  (this isn't
   true in Python 2.x, and the difference is a cause of bug reports)

 - side effects like finding that float(x) rounds correctly for
   Decimal instances x.

 - the output from the new rule is more consistent: the 'strip trailing
   zeros' part of the old rule has some strange consequences:  e.g.,
   in 2.x right now (on a typical machine):

   >>> 0.02
   0.02
   >>> 0.03
   0.029999999999999999

   even though neither 0.02 nor 0.03 can be exactly represented
   in binary.  3.x gives '0.02' and '0.03'.

 - repr(x) is consistent across platforms (or at least across platforms
   with IEEE 754 doubles;  in practice this seems to account for
   virtually all platforms currently running Python).

 - the float <-> string conversions are under our control, so any bugs
   found can be fixed in the Python source.  There's no shortage of
   conversion bugs in the wild, and certainly bugs have been observed in
   OS X, Linux and Windows.  (The ones I found in OS X 10.5 have
   been fixed in OS X 10.6, though.)

Possible problems:

 - breaking docstrings in third party code.  Though Eric reminded me
   that when we implemented this for 3.1, there were essentially no
   standard library test breakages resulting from the changed repr
   format.

 - some might argue that the new repr (and round) just allows users
   to remain ignorant of floating-point difficulties for longer, and that
   this is a bad thing.  I don't really buy either of these points.

 - someone has to put in the work.  As mentioned below, I'm happy
   to do this (and Eric's offered to help, without which this probably
   wouldn't be feasible at all), but it'll use cycles that I could also
   usefully be spending elsewhere.

I'm mostly neutral on the backport idea:  I'm very happy that this is
in 3.x, but don't see any great need to backport it.  But if there's
majority (+BDFL) support, I'm willing to put the work in to do the
backport.

Masochists who are still reading by this point and who want more
information about the new repr implementation can see the issue
discussion:

http://bugs.python.org/issue1580

Thoughts?

Mark