Round Bug in Python 1.6?

Hi, asa side effect, I happened to observe the following rounding bug. It happens in Stackless Python, which is built against the pre-unicode CVS branch. Is this changed for 1.6, or might it be my bug? D:\python\spc>python Python 1.5.42+ (#0, Mar 29 2000, 20:23:26) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
round(3.1415926585, 4) 3.1415999999999999 ^Z
D:\python>python Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
round(3.1415926585, 4) 3.1416 ^Z
ciao - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com

Chris> I happened to observe the following rounding bug. It happens in Chris> Stackless Python, which is built against the pre-unicode CVS Chris> branch. Chris> Is this changed for 1.6, or might it be my bug? I doubt it's your problem. I see it too with 1.6a2 (no stackless): % ./python Python 1.6a2 (#2, Apr 6 2000, 15:27:22) [GCC pgcc-2.91.66 19990314 (egcs-1.1.2 release)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> round(3.1415926585, 4) 3.1415999999999999 Same behavior whether compiled with -O2 or -g. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/

asa side effect, I happened to observe the following rounding bug. It happens in Stackless Python, which is built against the pre-unicode CVS branch.
Is this changed for 1.6, or might it be my bug?
D:\python\spc>python Python 1.5.42+ (#0, Mar 29 2000, 20:23:26) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
round(3.1415926585, 4) 3.1415999999999999 ^Z
D:\python>python Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
round(3.1415926585, 4) 3.1416 ^Z
This is because repr() now uses full precision for floating point numbers. round() does what it can, but 3.1416 just can't be represented exactly, and "%.17g" gives 3.1415999999999999. This is definitely the right thing to do for repr() -- ask Tim. However, it may be time to switch so that "immediate expression" values are printed as str() instead of as repr()... --Guido van Rossum (home page: http://www.python.org/~guido/)

On 06 April 2000, Guido van Rossum said:
This is because repr() now uses full precision for floating point numbers. round() does what it can, but 3.1416 just can't be represented exactly, and "%.17g" gives 3.1415999999999999.
This is definitely the right thing to do for repr() -- ask Tim.
However, it may be time to switch so that "immediate expression" values are printed as str() instead of as repr()...
+1 on this: it's easier to change "foo" to "`foo`" than to "str(foo)" or "print foo". It just makes more sense to use str(). Oh, joy! oh happiness! someday soon, I may be able to type "blah.__doc__" at the interactive prompt and get a readable result! Greg

On 07-Apr-00 Greg Ward wrote:
Oh, joy! oh happiness! someday soon, I may be able to type "blah.__doc__" at the interactive prompt and get a readable result!
Just i case... I hope you haven't missed "print blah.__doc__". /Mikael ----------------------------------------------------------------------- E-Mail: Mikael Olofsson <mikael@isy.liu.se> WWW: http://www.dtr.isy.liu.se/dtr/staff/mikael Phone: +46 - (0)13 - 28 1343 Telefax: +46 - (0)13 - 28 1339 Date: 07-Apr-00 Time: 14:56:52 This message was sent by XF-Mail. -----------------------------------------------------------------------

On 07 April 2000, Mikael Olofsson said:
On 07-Apr-00 Greg Ward wrote:
Oh, joy! oh happiness! someday soon, I may be able to type "blah.__doc__" at the interactive prompt and get a readable result!
Just i case... I hope you haven't missed "print blah.__doc__".
Yeah, I know: my usual mode of operation is this:
blah.__doc__ ...repr of docstring... ...sound of me cursing... print blah.__doc__
The real reason for using str() at the interactive prompt is not to save me keystrokes, but because it just seems like the sensible thing to do. People who understand the str/repr difference, and really want the repr version, can slap backquotes around whatever they're printing. Greg

Greg wrote:
Yeah, I know: my usual mode of operation is this:
blah.__doc__ ...repr of docstring... ...sound of me cursing... print blah.__doc__
on the other hand, I tend to do this now and then:
blah = foo() # returns chunk of binary data blah
which, if you use str instead of repr, can reprogram your terminal window in many interesting ways... but I think I'm +1 on this anyway. or at least +0.90000000000000002 </F>

On Thu, 6 Apr 2000, Guido van Rossum wrote:
However, it may be time to switch so that "immediate expression" values are printed as str() instead of as repr()...
Just checking my newly bought "Guido Channeling" kit -- you mean str() but special case the snot out of strings(TM), don't you Trademark probably belong to Tim Peters. -- Moshe Zadka <mzadka@geocities.com>. http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com

Just checking my newly bought "Guido Channeling" kit -- you mean str() but special case the snot out of strings(TM), don't you
Except I'm not sure what kind of special-casing should be happening. Put quotes around it without worrying if that makes it a valid string literal is one thought that comes to mind. Another approach might be what Tk's text widget does -- pass through certain control characters (LF, TAB) and all (even non-ASCII) printing characters, but display other control characters as \x.. escapes rather than risk putting the terminal in a weird mode. No quotes though. Hm, I kind of like this: when used as intended, it will just display the text, with newlines and umlauts etc.; but when printing binary gibberish, it will do something friendly. There's also the issue of what to do with lists (or tuples, or dicts) containing strings. If we agree on this:
"hello\nworld\n\347" # octal 347 is a cedilla hello world ç
Then what should ("hello\nworld", "\347") show? I've got enough serious complaints that I don't want to propose that it use repr():
("hello\nworld", "\347") ('hello\nworld', '\347')
Other possibilities:
("hello\nworld", "\347") ('hello world', 'ç')
or maybe
("hello\nworld", "\347") ('''hello world''', 'ç')
Of course there's also the Unicode issue -- the above all assumes Latin-1 for stdout. Still no closure, I think... --Guido van Rossum (home page: http://www.python.org/~guido/)

[Moshe Zadka]
Just checking my newly bought "Guido Channeling" kit -- you mean str() but special case the snot out of strings(TM), don't you
[Guido]
Except I'm not sure what kind of special-casing should be happening.
Welcome to the club.
Put quotes around it without worrying if that makes it a valid string literal is one thought that comes to mind.
If nothing else <wink>, Ping convinced me the temptation to type that back in will prove overwhelming.
Another approach might be what Tk's text widget does -- pass through certain control characters (LF, TAB) and all (even non-ASCII) printing characters, but display other control characters as \x.. escapes rather than risk putting the terminal in a weird mode.
This must be platform-dependent? Just tried this loop in Win95 IDLE, using Courier:
for i in range(256): print i, chr(i),
Across the whole range, it just showed what Windows always shows in the Courier font (which is usually a (empty or filled) rectangle for most "control characters"). No \x escapes at all. BTW, note that Tk unhelpfully translates a request for "Courier New" into a request for "Courier", which aren't the same fonts under Windows! So if anyone tries this with the IDLE Windows defaults, and doesn't see all the special characters Windows assigns to the range 128-159 in Courier New, that's why -- most of them aren't assigned under Courier.
No quotes though. Hm, I kind of like this: when used as intended, it will just display the text, with newlines and umlauts etc.; but when printing binary gibberish, it will do something friendly.
Can't be worse than what happens now <wink>.
There's also the issue of what to do with lists (or tuples, or dicts) containing strings. If we agree on this:
"hello\nworld\n\347" # octal 347 is a cedilla hello world ç
I don't think there is agreement on this, because nothing in the output says "btw, this thing was a string". Is that worth preserving? "It depends" is the only answer I've got to that.
Then what should ("hello\nworld", "\347") show? I've got enough serious complaints that I don't want to propose that it use repr():
("hello\nworld", "\347") ('hello\nworld', '\347')
Other possibilities:
("hello\nworld", "\347") ('hello world', 'ç')
or maybe
("hello\nworld", "\347") ('''hello world''', 'ç')
I like the last best.
Of course there's also the Unicode issue -- the above all assumes Latin-1 for stdout.
Still no closure, I think...
It's curious how you invoke "closure" when and only when you don't know what *you* want to do <wink>. a-guido-divided-against-himself-cannot-stand-ly y'rs - tim

[posted & mailed] [Christian Tismer]
as a side effect, I happened to observe the following rounding bug. It happens in Stackless Python, which is built against the pre-unicode CVS branch.
Is this changed for 1.6, or might it be my bug?
It's a 1.6 thing, and is not a bug.
D:\python\spc>python Python 1.5.42+ (#0, Mar 29 2000, 20:23:26) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
round(3.1415926585, 4) 3.1415999999999999 ^Z
The best possible IEEE-754 double approximation to 3.1416 is (exactly) 3.141599999999999948130380289512686431407928466796875 so the output you got is correctly rounded to 17 significant digits. IOW, it's a feature. 1.6 boosted the number of decimal digits repr(float) produces so that eval(repr(x)) == x for every finite float on every platform with an IEEE-754-conforming libc. It was actually rare for that equality to hold pre-1.6. repr() cannot produce fewer digits than this without allowing the equality to fail in some cases. The 1.6 str() still produces the *illusion* that the result is 3.1416 (as repr() also did pre-1.6). IMO it would be better if Python stopped using repr() (at least by default) for formatting expressions at the interactive prompt (for much more on this, see DejaNews). the-two-things-you-can-do-about-it-are-nothing-and-love-it<wink>-ly y'rs - tim

Tim Peters wrote:
The best possible IEEE-754 double approximation to 3.1416 is (exactly)
3.141599999999999948130380289512686431407928466796875
so the output you got is correctly rounded to 17 significant digits. IOW, it's a feature.
I'm very respectful when I see a number with so many digits in a row. :-) I'm not sure that this will be of any interest to you, number crunchers, but a research team in computer arithmetics here reported some major results lately: they claim that they "solved" the Table Maker's Dilemma for most common functions in IEEE-754 double precision arithmetic. (and no, don't ask me what this means ;-) For more information, see: http://www.ens-lyon.fr/~jmmuller/Intro-to-TMD.htm -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

[Vladimir Marangozov]
I'm not sure that this will be of any interest to you, number crunchers, but a research team in computer arithmetics here reported some major results lately: they claim that they "solved" the Table Maker's Dilemma for most common functions in IEEE-754 double precision arithmetic. (and no, don't ask me what this means ;-)
Back in the old days, some people spent decades making tables of various function values. A common way was to laboriously compute high-precision values over a sparse grid, using e.g. series expansions, then extend that to a fine grid via relatively simple interpolation formulas between the high-precision results. You have to compute the sparse grid to *some* "extra" precision in order to absorb roundoff errors in the interpolated values. The "dilemma" is figuring out how *much* extra precision: too much and it greatly slows the calculations, too little and the interpolated values are inaccurate. The "problem cases" for a function f(x) are those x such that the exact value of f(x) is very close to being exactly halfway between representable numbers. In order to round correctly, you have to figure out which representable number f(x) is closest to. How much extra precision do you need to use to resolve this correctly in all cases? Suppose you're computing f(x) to 2 significant decimal digits, using 4-digit arithmetic, and for some specific x0 f(x0) turns out to be 41.49 +- 3. That's not enough to know whether it *should* round to 41 or 42. So you need to try again with more precision. But how much? You might try 5 digits next, and might get 41.501 +- 3, and you're still stuck. Try 6 next? Might be a waste of effort. Try 20 next? Might *still* not be enough -- or could just as well be that 7 would have been enough and you did 10x the work you needed to do. Etc. It turns out that for most functions there's no general way known to answer the "how much?" question in advance: brute force is the best method known. For various IEEE double precision functions, so far it's turned out that you need in the ballpark of 40-60 extra accurate bits (beyond the native 53) in order to round back correctly to 53 in all cases, but there's no *theory* supporting that. It *could* require millions of extra bits. For those wondering "why bother?", the practical answer is this: if a std could require correct rounding, functions would be wholly portable across machines ("correctly rounded" is precisely defined by purely mathematical means). That's where IEEE-754 made its huge break with tradition, by requiring correct rounding for + - * / and sqrt. The places it left fuzzy (like string<->float, and all transcendental functions) are the places your program produces different results when you port it. Irritating one: MS VC++ on Intel platforms generates different code for exp() depending on the optimization level. They often differ in the last bit they compute. This wholly accounts for why Dragon's speech recognition software sometimes produces subtly (but very visibly!) different results depending on how it was compiled. Before I got tossed into this pit, it was assumed for a year to be either a -O bug or somebody fetching uninitialized storage. that's-what-you-get-when-you-refuse-to-define-results-ly y'rs - tim

Tim Peters wrote:
Suppose you're computing f(x) to 2 significant decimal digits, using 4-digit arithmetic, and for some specific x0 f(x0) turns out to be 41.49 +- 3. That's not enough to know whether it *should* round to 41 or 42. So you need to try again with more precision. But how much? You might try 5 digits next, and might get 41.501 +- 3, and you're still stuck. Try 6 next? Might be a waste of effort. Try 20 next? Might *still* not be enough -- or could just as well be that 7 would have been enough and you did 10x the work you needed to do.
Right. From what I understand, the dilemma is this: In order to round correctly, how much extra precision do we need, so that the range of uncertainity (+-3 in your example) does not contain the middle of two consecutive representable numbers (say 41.49 and 41.501). "Solving" the dilemma is predicting this extra precision so that the ranges of uncertainity does not contain the middle of two consecutive floats. Which in turn equals to calculating the min distance between the image of a number and the middle of two consecutive machine numbers. And that's what these guys have calculated for common functions in IEEE-754 double precision, with brute force, using an apparently original algorithm they have proposed.
that's-what-you-get-when-you-refuse-to-define-results-ly y'rs - tim
I haven't asked for anything. It was just passive echoing with a good level of uncertainity :-). -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
participants (9)
-
Christian Tismer
-
Fredrik Lundh
-
Greg Ward
-
Guido van Rossum
-
Mikael Olofsson
-
Moshe Zadka
-
Skip Montanaro
-
Tim Peters
-
Vladimir.Marangozov@inrialpes.fr