Re: [Python-Dev] Round Bug in Python 1.6?
Tim Peters wrote:
The best possible IEEE-754 double approximation to 3.1416 is (exactly)
3.141599999999999948130380289512686431407928466796875
Let's call this number 'A' for the sake of discussion.
so the output you got is correctly rounded to 17 significant digits. IOW, it's a feature.
Clearly there is something very wrong here: Python 1.5.2+ (#2, Mar 28 2000, 18:27:50) Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> 3.1416 3.1415999999999999 >>> Now you say that 17 significant digits are required to ensure that eval(repr(x)) == x, but we surely know that 17 digits are *not* required when x is A because i *just typed in* 3.1416 and the best choice of double value was A. I haven't gone and figured it out, but i'll take your word for it that 17 digits may be required in *certain* cases to ensure that eval(repr(x)) == x. They're just not required in all cases. It's very jarring to type something in, and have the interpreter give you back something that looks very different. It breaks a fundamental rule of consistency, and that damages the user's trust in the system or their understanding of the system. (What do you do then, start explaining the IEEE double representation to your CP4E beginner?) What should really happen is that floats intelligently print in the shortest and simplest manner possible, i.e. the fewest number of digits such that the decimal representation will convert back to the actual value. Now you may say this is a pain to implement, but i'm talking about sanity for the user here. I haven't investigated how to do this best yet. I'll go off now and see if i can come up with an algorithm that's not quite so stupid as def smartrepr(x): p = 17 while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 return '%%.%df' % p % x -- ?!ng
Tim Peters wrote:
The best possible IEEE-754 double approximation to 3.1416 is (exactly)
3.141599999999999948130380289512686431407928466796875
Let's call this number 'A' for the sake of discussion.
so the output you got is correctly rounded to 17 significant digits. IOW, it's a feature.
Clearly there is something very wrong here:
Python 1.5.2+ (#2, Mar 28 2000, 18:27:50) Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> 3.1416 3.1415999999999999 >>>
Now you say that 17 significant digits are required to ensure that eval(repr(x)) == x, but we surely know that 17 digits are *not* required when x is A because i *just typed in* 3.1416 and the best choice of double value was A.
Ping has a point!
I haven't gone and figured it out, but i'll take your word for it that 17 digits may be required in *certain* cases to ensure that eval(repr(x)) == x. They're just not required in all cases.
It's very jarring to type something in, and have the interpreter give you back something that looks very different. It breaks a fundamental rule of consistency, and that damages the user's trust in the system or their understanding of the system. (What do you do then, start explaining the IEEE double representation to your CP4E beginner?)
What should really happen is that floats intelligently print in the shortest and simplest manner possible, i.e. the fewest number of digits such that the decimal representation will convert back to the actual value. Now you may say this is a pain to implement, but i'm talking about sanity for the user here.
I haven't investigated how to do this best yet. I'll go off now and see if i can come up with an algorithm that's not quite so stupid as
def smartrepr(x): p = 17 while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 return '%%.%df' % p % x
Have a look at what Java does; it seems to be doing this right: & jpython JPython 1.1 on java1.2 (JIT: sunwjit) Copyright (C) 1997-1999 Corporation for National Research Initiatives
import java.lang x = java.lang.Float(3.1416) x.toString() '3.1416' ^D &
Could it be as simple as converting x +/- one bit and seeing how many differing digits there were? (Not that +/- one bit is easy to calculate...) --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido]
Have a look at what Java does; it seems to be doing this right:
& jpython JPython 1.1 on java1.2 (JIT: sunwjit) Copyright (C) 1997-1999 Corporation for National Research Initiatives
import java.lang x = java.lang.Float(3.1416) x.toString() '3.1416'
That Java does this is not an accident: Guy Steele pushed for the same rules he got into Scheme, although a) The Java rules are much tighter than Scheme's. and b) He didn't prevail on this point in Java until version 1.1 (before then Java's double/float->string never produced more precision than ANSI C's default %g format, so was inadequate to preserve equality under I/O). I suspect there was more than a bit of internal politics behind the delay, as the 754 camp has never liked the "minimal width" gimmick(*), and Sun's C and Fortran numerics (incl. their properly-rounding libc I/O routines) were strongly influenced by 754 committee members.
Could it be as simple as converting x +/- one bit and seeing how many differing digits there were? (Not that +/- one bit is easy to calculate...)
Sorry, it's much harder than that. See the papers (and/or David Gay's code) I referenced before. (*) Why the minimal-width gimmick is disliked: If you print a (32-bit) IEEE float with minimal width, then read it back in as a (64-bit) IEEE double, you may not get the same result as if you had converted the original float to a double directly. This is because "minimal width" here is *relative to* the universe of 32-bit floats, and you don't always get the same minimal width if you compute it relative to the universe of 64-bit doubles instead. In other words, "minimal width" can lose accuracy needlessly -- but this can't happen if you print the float to full precision instead.
[Ka-Ping Yee]
,,, Now you say that 17 significant digits are required to ensure that eval(repr(x)) == x,
Yes. This was first proved in Jerome Coonen's doctoral dissertation, and is one of the few things IEEE-754 guarantees about fp I/O: that input(output(x)) == x for all finite double x provided that output() produces at least 17 significant decimal digits (and 17 is minimal). In particular, IEEE-754 does *not* guarantee that either I or O are properly rounded, which latter is needed for what *you* want to see here. The std doesn't require proper rounding in this case (despite that it requires it in all other cases) because no efficient method for doing properly rounded I/O was known at the time (and, alas, that's still true).
but we surely know that 17 digits are *not* required when x is A because i *just typed in* 3.1416 and the best choice of double value was A.
Well, x = 1.0 provides a simpler case <wink>.
I haven't gone and figured it out, but i'll take your word for it that 17 digits may be required in *certain* cases to ensure that eval(repr(x)) == x. They're just not required in all cases.
It's very jarring to type something in, and have the interpreter give you back something that looks very different.
It's in the very nature of binary floating-point that the numbers they type in are often not the numbers the system uses.
It breaks a fundamental rule of consistency, and that damages the user's trust in the system or their understanding of the system.
If they're surprised by this, they indeed don't understand the arithmetic at all! This is an argument for using a different form of arithmetic, not for lying about reality.
(What do you do then, start explaining the IEEE double representation to your CP4E beginner?)
As above. repr() shouldn't be used at the interactive prompt anyway (but note that I did not say str() should be).
What should really happen is that floats intelligently print in the shortest and simplest manner possible, i.e. the fewest number of digits such that the decimal representation will convert back to the actual value. Now you may say this is a pain to implement, but i'm talking about sanity for the user here.
This can be done, but only if Python does all fp I/O conversions entirely on its own -- 754-conforming libc routines are inadequate for this purpose (and, indeed, I don't believe any libc other than Sun's does do proper rounding here). For background and code, track down "How To Print Floating-Point Numbers Accurately" by Steele & White, and its companion paper (s/Print/Read/) by Clinger. Steele & White were specifically concerned with printing the "shortest" fp representation possible such that proper input could later reconstruct the value exactly. Steele, White & Clinger give relatively simple code for this that relies on unbounded int arithmetic. Excruciatingly difficult and platform-#ifdef'ed "optimized" code for this was written & refined over several years by the numerical analyst David Gay, and is available from Netlib.
I haven't investigated how to do this best yet. I'll go off now and see if i can come up with an algorithm that's not quite so stupid as
def smartrepr(x): p = 17 while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 return '%%.%df' % p % x
This merely exposes accidents in the libc on the specific platform you run it. That is, after print smartrepr(x) on IEEE-754 platform A, reading that back in on IEEE-754 platform B may not yield the same number platform A started with. Both platforms have to do proper rounding to make this work; there's no way to do proper rounding by using libc; so Python has to do it itself; there's no efficient way to do it regardless; nevertheless, it's a noble goal, and at least a few languages in the Lisp family require it (most notably Scheme, from whence Steele, White & Clinger's interest in the subject). you're-in-over-your-head-before-the-water-touches-your-toes<wink>-ly y'rs - tim
In a previous message, i wrote:
It's very jarring to type something in, and have the interpreter give you back something that looks very different. [...] It breaks a fundamental rule of consistency, and that damages the user's trust in the system or their understanding of the system.
Then on Fri, 7 Apr 2000, Tim Peters replied:
If they're surprised by this, they indeed don't understand the arithmetic at all! This is an argument for using a different form of arithmetic, not for lying about reality.
This is not lying! If you type in "3.1416" and Python says "3.1416", then indeed it is the case that "3.1416" is a correct way to type in the floating-point number being expressed. So "3.1415999999999999" is not any more truthful than "3.1416" -- it's just more annoying. I just tried this in Python 1.5.2+: >>> .1 0.10000000000000001 >>> .2 0.20000000000000001 >>> .3 0.29999999999999999 >>> .4 0.40000000000000002 >>> .5 0.5 >>> .6 0.59999999999999998 >>> .7 0.69999999999999996 >>> .8 0.80000000000000004 >>> .9 0.90000000000000002 Ouch. I wrote:
(What do you do then, start explaining the IEEE double representation to your CP4E beginner?)
Tim replied:
As above. repr() shouldn't be used at the interactive prompt anyway (but note that I did not say str() should be).
What, then? Introduce a third conversion routine and further complicate the issue? I don't see why it's necessary. I wrote:
What should really happen is that floats intelligently print in the shortest and simplest manner possible
Tim replied:
This can be done, but only if Python does all fp I/O conversions entirely on its own -- 754-conforming libc routines are inadequate for this purpose
Not "all fp I/O conversions", right? Only repr(float) needs to be implemented for this particular purpose. Other conversions like "%f" and "%g" can be left to libc, as they are now. I suppose for convenience's sake it may be nice to add another format spec so that one can ask for this behaviour from the "%" operator as well, but that's a separate issue (perhaps "%r" to insert the repr() of an argument of any type?).
For background and code, track down "How To Print Floating-Point Numbers Accurately" by Steele & White, and its companion paper (s/Print/Read/)
Thanks! I found 'em. Will read... I suggested:
def smartrepr(x): p = 17 while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 return '%%.%df' % p % x
Tim replied:
This merely exposes accidents in the libc on the specific platform you run it. That is, after
print smartrepr(x)
on IEEE-754 platform A, reading that back in on IEEE-754 platform B may not yield the same number platform A started with.
That is not repr()'s job. Once again: repr() is not for the machine. It is not part of repr()'s contract to ensure the kind of platform-independent conversion you're talking about. It prints out the number in a way that upholds the eval(repr(x)) == x contract for the system you are currently interacting with, and that's good enough. If you wanted platform-independent serialization, you would use something else. As long as the language reference says "These represent machine-level double precision floating point numbers. You are at the mercy of the underlying machine architecture and C implementation for the accepted range and handling of overflow." and until Python specifies the exact sizes and behaviours of its floating-point numbers, you can't expect these kinds of cross-platform guarantees anyway. Here are the expectations i've come to have: str()'s contract: - if x is a string, str(x) == x - otherwise, str(x) is a reasonable string coercion from x repr()'s contract: - if repr(x) is syntactically valid, eval(repr(x)) == x - repr(x) displays x in a safe and readable way - for objects composed of basic types, repr(x) reflects what the user would have to say to produce x pickle's contract: - pickle.dumps(x) is a platform-independent serialization of the value and state of object x -- ?!ng
Ok, just a word (carefully:) Ka-Ping Yee wrote: ...
I just tried this in Python 1.5.2+:
>>> .1 0.10000000000000001 >>> .2 0.20000000000000001 >>> .3 0.29999999999999999
Agreed that this is not good. ...
repr()'s contract: - if repr(x) is syntactically valid, eval(repr(x)) == x - repr(x) displays x in a safe and readable way - for objects composed of basic types, repr(x) reflects what the user would have to say to produce x
This sounds reasonable. BTW my problem did not come up by typing something in, but I just rounded a number down to 3 digits past the dot. Then, as usual, I just let the result drop from the prompt, without prefixing it with "print". repr() was used, and the result was astonishing. Here is the problem, as I see it: You say if you type 3.1416, you want to get exactly this back. But how should Python know that you typed it in? Same in my case: I just rounded to 3 digits, but how should Python know about this? And what do you expect when you type in 3.14160, do you want the trailing zero preserved or not? Maybe we would need to carry exactness around for numbers. Or even have a different float type for cases where we want exact numbers? Keyboard entry and rounding produce exact numbers. Simple operations between exact numbers would keep exactness, higher level functions would probably not. I think we dlved into a very difficult domain here. ciao - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com
On Sun, 9 Apr 2000, Christian Tismer wrote:
Here is the problem, as I see it: You say if you type 3.1416, you want to get exactly this back. But how should Python know that you typed it in? Same in my case: I just rounded to 3 digits, but how should Python know about this?
And what do you expect when you type in 3.14160, do you want the trailing zero preserved or not?
It's okay for the zero to go away, because it doesn't affect the value of the number. (Carrying around a significant-digit count or error range with numbers is another issue entirely, and a very thorny one at that.) I think "fewest digits needed to distinguish the correct value" will give good and least-surprising results here. This method guarantees: - If you just type a number in and the interpreter prints it back, it will never respond with more junk digits than you typed. - If you type in what the interpreter displays for a float, you can be assured of getting the same value.
Maybe we would need to carry exactness around for numbers. Or even have a different float type for cases where we want exact numbers? Keyboard entry and rounding produce exact numbers.
If you mean a decimal representation, yes, perhaps we need to explore that possibility a little more. -- ?!ng "All models are wrong; some models are useful." -- George Box
Ka-Ping Yee wrote:
On Sun, 9 Apr 2000, Christian Tismer wrote:
Here is the problem, as I see it: You say if you type 3.1416, you want to get exactly this back. But how should Python know that you typed it in? Same in my case: I just rounded to 3 digits, but how should Python know about this?
And what do you expect when you type in 3.14160, do you want the trailing zero preserved or not?
It's okay for the zero to go away, because it doesn't affect the value of the number. (Carrying around a significant-digit count or error range with numbers is another issue entirely, and a very thorny one at that.)
I think "fewest digits needed to distinguish the correct value" will give good and least-surprising results here. This method guarantees:
Hmm, I hope I understood. Oh, wait a minute! What is the method? What is the correct value? If I type
0.1 0.10000000000000001 0.10000000000000001 0.10000000000000001
There is only one value: The one which is in the machine. Would you think it is ok to get 0.1 back, when you actually *typed* 0.10000000000000001 ? -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com
[Christian]
Hmm, I hope I understood. Oh, wait a minute! What is the method? What is the correct value?
If I type
0.1 0.10000000000000001 0.10000000000000001 0.10000000000000001
There is only one value: The one which is in the machine. Would you think it is ok to get 0.1 back, when you actually *typed* 0.10000000000000001 ?
Yes, this is the kind of surprise I sketched with the "2-bit machine" example. It can get more surprising than the above (where, as you suspect, "shortest conversion" yields "0.1" for both -- which, btw, is why reading it back in to a float type with more precision loses accuracy needlessly, which in turn is why 754 True Believers dislike it). repetitively y'rs - tim
[Ping]
... I think "fewest digits needed to distinguish the correct value" will give good and least-surprising results here. This method guarantees:
- If you just type a number in and the interpreter prints it back, it will never respond with more junk digits than you typed.
Note the example from another reply of a machine with 2-bit floats. There the user would see:
0.75 # happens to be exactly representable on this machine 0.8 # because that's the shortest string needed on this machine # to get back 0.75 internally
This kind of surprise is inherent in the approach, not specific to 2-bit machines <wink>. BTW, I don't know that it will never print more digits than you type: did you prove that? It's plausible, but many plausible claims about fp turn out to be false.
- If you type in what the interpreter displays for a float, you can be assured of getting the same value.
This isn't of value for most interactive use -- in general you want to see the range of a number, not enough to get 53 bits exactly (that's beyond the limits of human "number sense"). It also has one clearly bad aspect: when printing containers full of floats, the number of digits printed for each will vary wildly from float to float. Makes for an unfriendly display. If the prompt's display function were settable, I'd probably plug in pprint!
Sorry, i'm a little behind on this. I'll try to catch up over the next day or two. On Sun, 9 Apr 2000, Tim Peters wrote:
Note the example from another reply of a machine with 2-bit floats. There the user would see:
0.75 # happens to be exactly representable on this machine 0.8 # because that's the shortest string needed on this machine # to get back 0.75 internally
This kind of surprise is inherent in the approach, not specific to 2-bit machines <wink>.
Okay, okay. But on a 2-bit machine you ought to be no more surprised by the above than by >>> 0.1 + 0.1 0.0 >>> 0.4 + 0.4 1.0 In fact, i suppose one could argue that 0.8 is just as honest as 0.75, as you could get 0.8 from anything in (0.625, 0.825)... or even *more* honest than 0.75, since "0.75" shows more significant digits than the precision of machine would justify. <shrug> It could be argued either way. I don't see this as a fatal flaw of the 'smartrepr' method, though. After looking at the spec for java.lang.Float.toString() and the Clinger paper you mentioned, it appears to me that both essentially describe 'smartrepr', which seems encouraging.
BTW, I don't know that it will never print more digits than you type: did you prove that? It's plausible, but many plausible claims about fp turn out to be false.
Indeed, fp *is* tricky, but i think in this case the proof actually is pretty evident -- The 'smartrepr' routine i suggested prints the representation with the fewest number of digits which converts back to the actual value. Since the thing that you originally typed converted to that value the first time around, certainly no *more* digits than what you typed are necessary to produce that value again. QED.
- If you type in what the interpreter displays for a float, you can be assured of getting the same value.
This isn't of value for most interactive use -- in general you want to see the range of a number, not enough to get 53 bits exactly (that's beyond the limits of human "number sense").
What do you mean by "the range of a number"?
It also has one clearly bad aspect: when printing containers full of floats, the number of digits printed for each will vary wildly from float to float. Makes for an unfriendly display.
Yes, this is something you want to be able to control -- read on.
If the prompt's display function were settable, I'd probably plug in pprint!
Since i've managed to convince Guido that such a hook might be nice, i seem to have worked myself into the position of being responsible for putting together a patch to do so... Configurability is good. It won't solve everything, but at least the flexibility provided by a "display" hook will let everybody have the ability to play whatever tricks they want. (Or, equivalently: to anyone who complains about the interpreter display, at least we have plausible grounds on which to tell them to go fix it themselves.) :) Here is what i have in mind: provide two hooks __builtins__.display(object) and __builtins__.displaytb(traceback, exception) that are called when the interpreter needs to display a result or when the top level catches an exception. Protocol is simple: 'display' gets one argument, an object, and can do whatever the heck it wants. 'displaytb' gets a traceback and an exception, and can do whatever the heck it wants. -- ?!ng "Je n'aime pas les stupides gar�ons, m�me quand ils sont intelligents." -- Roople Unia
Ka-Ping Yee writes:
Here is what i have in mind: provide two hooks __builtins__.display(object) and __builtins__.displaytb(traceback, exception)
Shouldn't these be in sys, along with sys.ps1 and sys.ps2? We don't want to add new display() and displaytb() built-ins, do we? --amk
On Wed, 12 Apr 2000, Andrew M. Kuchling wrote:
Ka-Ping Yee writes:
Here is what i have in mind: provide two hooks __builtins__.display(object) and __builtins__.displaytb(traceback, exception)
Shouldn't these be in sys, along with sys.ps1 and sys.ps2? We don't want to add new display() and displaytb() built-ins, do we?
Yes, you're right, they belong in sys. For a while i was under the delusion that you could customize more than one sub-interpreter by giving each one a different modified __builtins__, but that's an rexec thing and completely the wrong approach. Looks like the right approach to customizing sub-interpreters is to generalize the interface of code.InteractiveInterpreter and add more options to code.InteractiveConsole. sys.display and sys.displaytb would then be specifically for tweaking the main interactive interpreter only (just like sys.ps1 and sys.ps2). Still quite worth it, i believe, so i'll proceed. -- ?!ng "You should either succeed gloriously or fail miserably. Just getting by is the worst thing you can do." -- Larry Smith
[Christian Tismer]
... Here is the problem, as I see it: You say if you type 3.1416, you want to get exactly this back.
But how should Python know that you typed it in? Same in my case: I just rounded to 3 digits, but how should Python know about this?
And what do you expect when you type in 3.14160, do you want the trailing zero preserved or not?
Maybe we would need to carry exactness around for numbers. Or even have a different float type for cases where we want exact numbers? Keyboard entry and rounding produce exact numbers. Simple operations between exact numbers would keep exactness, higher level functions would probably not.
I think we dlved into a very difficult domain here.
"This kind of thing" is hopeless so long as Python uses binary floating point. Ping latched on to "shortest" conversion because it appeared to solve "the problem" in a specific case. But it doesn't really solve anything -- it just shuffles the surprises around. For example,
3.1416 - 3.141 0.00059999999999993392
Do "shorest conversion" (relative to the universe of IEEE doubles) instead, and it would print 0.0005999999999999339 Neither bears much syntactic resemblance to the 0.0006 the numerically naive "expect". Do anything less than the 16 significant digits shortest conversion happens to produce in this case, and eval'ing the string won't return the number you started with. So "0.0005999999999999339" is the "best possible" string repr can produce (assuming you think "best" == "shortest faithful, relative to the platform's universe of possibilities", which is itself highly debatable). If you don't want to see that at the interactive prompt, one of two things has to change: A) Give up on eval(repr(x)) == x for float x, even on a single machine. or B) Stop using repr by default. There is *no* advantage to #A over the long haul: lying always extracts a price, and unlike most of you <wink>, I appeared to be the lucky email recipient of the passionate gripes about repr(float)'s inadequacy in 1.5.2 and before. Giving a newbie an illusion of comfort at the cost of making it useless for experts is simply nuts. The desire for #B pops up from multiple sources: people trying to use native non-ASCII chars in strings; people just trying to display docstrings without embedded "\012" (newline) and "\011" (tab) escapes; and people using "big" types (like NumPy arrays or rationals) where repr() can produce unboundedly more info than the interactive user typically wants to see. It *so happens* that str() already "does the right thing" in all 3 of the last three points, and also happens to produce "0.0006" for the example above. This is why people leap to: C) Use str by default instead of repr. But str doesn't pass down to containees, and *partly* does a wrong thing when applied to strings, so it's not suitable either. It's *more* suitable than repr, though! trade-off-ing-ly y'rs - tim
[Tim]
If they're surprised by this, they indeed don't understand the arithmetic at all! This is an argument for using a different form of arithmetic, not for lying about reality.
This is not lying!
Yes, I overstated that. It's not lying, but I defy anyone to explain the full truth of it in a way even Guido could understand <0.9 wink>. "Shortest conversion" is a subtle concept, requiring knowledge not only of the mathematical value, but of details of the HW representation. Plain old "correct rounding" is HW-independent, so is much easier to *fully* understand. And in things floating-point, what you don't fully understand will eventually burn you. Note that in a machine with 2-bit floating point, the "shortest conversion" for 0.75 is the string "0.8": this should suggest the sense in which "shortest conversion" can be actively misleading too.
If you type in "3.1416" and Python says "3.1416", then indeed it is the case that "3.1416" is a correct way to type in the floating-point number being expressed. So "3.1415999999999999" is not any more truthful than "3.1416" -- it's just more annoying.
Yes, shortest conversion is *defensible*. But Python has no code to implement that now, so it's not an option today.
I just tried this in Python 1.5.2+:
>>> .1 0.10000000000000001 >>> .2 0.20000000000000001 >>> .3 0.29999999999999999 >>> .4 0.40000000000000002 >>> .5 0.5 >>> .6 0.59999999999999998 >>> .7 0.69999999999999996 >>> .8 0.80000000000000004 >>> .9 0.90000000000000002
Ouch.
As shown in my reply to Christian, shortest conversion is not a cure for this "gosh, it printed so much more than I expected it to"; it only appears to "fix it" in the simplest examples. So long as you want eval(what's_diplayed) == what's_typed, this is unavoidable. The only ways to avoid that are to use a different arithmetic, or stop using repr() at the prompt.
As above. repr() shouldn't be used at the interactive prompt anyway (but note that I did not say str() should be).
What, then? Introduce a third conversion routine and further complicate the issue? I don't see why it's necessary.
Because I almost never want current repr() or str() at the prompt, and even you <wink> don't want 3.1416-3.141 to display 0.0005999999999999339 (which is the least you can print and have eval return the true answer).
What should really happen is that floats intelligently print in the shortest and simplest manner possible
This can be done, but only if Python does all fp I/O conversions entirely on its own -- 754-conforming libc routines are inadequate for this purpose
Not "all fp I/O conversions", right? Only repr(float) needs to be implemented for this particular purpose. Other conversions like "%f" and "%g" can be left to libc, as they are now.
No, all, else you risk %f and %g producing results that are inconsistent with repr(), which creates yet another set of incomprehensible surprises. This is not an area that rewards half-assed hacks! I'm intimately familiar with just about every half-assed hack that's been tried here over the last 20 years -- they never work in the end. The only approach that ever bore fruit was 754's "there is *a* mathematically correct answer, and *that's* the one you return". Unfortunately, they dropped the ball here on float<->string conversions (and very publicly regret that today).
I suppose for convenience's sake it may be nice to add another format spec so that one can ask for this behaviour from the "%" operator as well, but that's a separate issue (perhaps "%r" to insert the repr() of an argument of any type?).
%r is cool! I like that.
def smartrepr(x): p = 17 while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 return '%%.%df' % p % x
This merely exposes accidents in the libc on the specific platform you run it. That is, after
print smartrepr(x)
on IEEE-754 platform A, reading that back in on IEEE-754 ?> platform B may not yield the same number platform A started with.
That is not repr()'s job. Once again:
repr() is not for the machine.
And once again, I didn't and don't agree with that, and, to save the next seven msgs, never will <wink>.
It is not part of repr()'s contract to ensure the kind of platform-independent conversion you're talking about. It prints out the number in a way that upholds the eval(repr(x)) == x contract for the system you are currently interacting with, and that's good enough.
It's not good enough for Java and Scheme, and *shouldn't* be good enough for Python. The 1.6 repr(float) is already platform-independent across IEEE-754 machines (it's not correctly rounded on most platforms, but *does* print enough that 754 guarantees bit-for-bit reproducibility) -- and virtually all Python platforms are IEEE-754 (I don't know of an exception -- perhaps Python is running on some ancient VAX?). The std has been around for 15+ years, virtually all platforms support it fully now, and it's about time languages caught up. BTW, the 1.5.2 text-mode pickle was *not* sufficient for reproducing floats either, even on a single machine. It is now -- but thanks to the change in repr.
If you wanted platform-independent serialization, you would use something else.
There is nothing else. In 1.5.2 and before, people mucked around with binary dumps hoping they didn't screw up endianness.
As long as the language reference says
"These represent machine-level double precision floating point numbers. You are at the mercy of the underlying machine architecture and C implementation for the accepted range and handling of overflow."
and until Python specifies the exact sizes and behaviours of its floating-point numbers, you can't expect these kinds of cross-platform guarantees anyway.
There's nothing wrong with exceeding expectations <wink>. Despite what the reference manual says, virtually all machines use identical fp representations today (this wasn't true when the text above was written).
str()'s contract: - if x is a string, str(x) == x - otherwise, str(x) is a reasonable string coercion from x
The last is so vague as to say nothing. My counterpart-- at least equally vague --is - otherwise, str(x) is a string that's easy to read and contains a compact summary indicating x's nature and value in general terms
repr()'s contract: - if repr(x) is syntactically valid, eval(repr(x)) == x - repr(x) displays x in a safe and readable way
I would say instead: - every character c in repr(x) has ord(c) in range(32, 128) - repr(x) should strive to be easily readable by humans
- for objects composed of basic types, repr(x) reflects what the user would have to say to produce x
Given your first point, does this say something other than "for basic types, repr(x) is syntactically valid"? Also unclear what "basic types" means.
pickle's contract: - pickle.dumps(x) is a platform-independent serialization of the value and state of object x
Since pickle can't handle all objects, this exaggerates the difference between it and repr. Give a fuller description, like - If pickle.dumps(x) is defined, pickle.loads(pickle.dumps(x)) == x and it's the same as the first line of your repr() contract, modulo s/syntactically valid/is defined/ s/eval/pickle.loads/ s/repr/pickle.dumps/ The differences among all these guys remain fuzzy to me. but-not-surprising-when-talking-about-what-people-like-to-look-at-ly y'rs - tim
participants (5)
-
Andrew M. Kuchling
-
Christian Tismer
-
Guido van Rossum
-
Ka-Ping Yee
-
Tim Peters