Rounding Bug in Python 2.0! - ugh

Thu Nov 2 00:43:07 EST 2000

[Pete Forman]
> It would be nice if Python would make use of dtoa() rather than stock
> sprintf() to display floating-point numbers.  There are variants of
> dtoa in netlib and libg++.

[Moshe Zadka]
> netlib and libg++ aren't always available.

"netlib" isn't a library; it's a large repository of scientific code, mostly
Fortran:  http://www.netlib.org/.  David Gay's float<->string conversion
code can be gotten from there.

> Python is meant to be portable...

Which may be a real problem, as Gay's code is excruciating.

> And anyway, the problem isn't necessarily one of display: you have to
> ways to display floats:
>
> str -- inaccurate but readable
> repr -- (more) accurate but unreadable

Pete's msg was unclear, but he's actually arguing for a 3rd way (which is
one of several modes Gay's code *can* be run in):

  repr -- produce the shortest string sufficient to exactly reproduce
          the input value

But then we also have to replace the input routines, as getting away with
that choice requires I/O accuracy greater than is required by the 754 std
(so we can't count on vendor input routines "to work" well enough).

[back to Pete]
> The criteria that I'm looking to satisfy are:
>
>   1) Round trip - reading a written number should give the same result
>   2) Write the minimum number of digits - drop trailing zeros/nines

[back to Moshe]
> Unfortunately, it's not well defined if you won't cross-computer
> round-trip. And you better believe that it's needed -- if you write out
> your data file, you might want to read it on a computer in your home too.

Provided we replace *all* Python FP I/O by Gay's routines, it would be
portable across all platforms where (in addition to that Gay's routines
actually work on the platform) Python's floats map to IEEE-754 doubles.  The
current trick of using a %.17g format for repr(float) is x-platform under
exactly the same assumption, coupled w/ that the platform FP I/O routines
meet 754's minimum requirements for FP I/O accuracy.

It's hardly a panacea, though; for example, picture a machine with 2-bit
floating-point.  The number 0.75 is exactly representable, with the two
closest representable numbers on either side being 0.5 and 1.0, so that
"shortest string sufficient to reproduce the input" would produce "0.8" or
"0.7" depending on rounding mode.  People would whine about that too.  That
754 doubles have 53 bits just moves the digit position where the complaints
start <0.5 wink>.

> life-just-sucks-ly y'rs, Z.

Na, what sucks is that Python implicitly invokes repr() far more often than
you really want to see it (but you're fixing that <wink>).

[Gareth McCaughan]
> One possibility would be for Python to use, instead of the
> system-provided sprintf, something like the algorithm described
> in the paper "Printing floating-point numbers quickly and
> accurately", by Dybvig and, er, someone whose name I've
> forgotten. This guarantees to print the shortest string
> that, when read in, gives exactly the value it's passed.

That's what Gay's routines do; see above.

> I'm not sure how fast this can be made, though; this might
> be a reason not to do it.

Gay's routines are faster than the ones in the papers (you're missing the
companion paper by Steele & White), but are nevertheless almost certainly
slower than most  sloppier platform routines (indeed, 754 doesn't require
correct rounding for FP I/O because there is no known method for doing that
efficiently in all cases).

[brucemdawson at my-deja.com]
> One could argue that if repr() is supposed to show as many
> digits as possible then it should print the exact number - instead
> of just a more accurate approximation.

No.  repr(float)'s main goal in life is that eval(repr(x)) == x for all
finite float x.  That's all.

> Of course, many versions of sprintf (well, VisualC++ at least) won't
> print more than about seventeen useful digits,

Exactly 17, because that's the minimum you can produce and still guarantee
the invariant eval(repr(x)) == x for all finite x ranging over 754 doubles.
754 requires that platform float I/O be "good enough" to meet that
invariant, provided at least 17 significant digits are produced, but doesn't
require much more than that.  And in the interests of speed and utter lack
of customer demand, most vendors settle for the minimum.

> so we'd have to use dtoa for that, and it's not clear that printing
> the exact value would actually make people any happier :-)

Not repr's job anyway.

don't-everyone-volunteer-at-once<wink>-ly y'rs  - tim