
You're right -- I apparently don't understand the issues well enough. And (worse) I didn't understand how far my understanding went. I was basing my proposal on Tim Delaney's suggestion and on a couple quick experiments on my 2 machines. Given that things aren't as simple as I had hoped they were, I guess I'll leave it up to people who know more to figure it out. But I'm not particularly happy about the fact that you have to be an expert to understand the intricacies of basic arithmetic with floats (not that I have a way to fix it). -Edward

On Tue, Mar 30, 2004, Edward Loper wrote:
On Tue, 30 Mar 2004, Aahz wrote:
I regret that this "feature" was ever introduced or "fixed" or what have you. Things were much better when repr(1.1) was "1.1" a few versions ago. This inconsistency is strange and surprising to every Python learner and I still believe there is no good reason for it. The motivation, as i remember it, was to make repr(x) produce a cross-platform representation of x. But no one uses repr() expecting bit-for-bit identity across platforms. repr() can't even represent most objects; if you want to transfer things between platforms, you would use pickle. If typing in "1.1" produces x, then "1.1" is a perfectly accurate representation of x on the current platform. And that is sufficient. Showing "1.1000000000000001" is a clear case of confusing lots of people in exchange for an obscure benefit to very few. If i could vote for just one thing to roll back about Python, this would be it. -- ?!ng

I wish that Python would use the same conversion rules as Scheme: string->float yields the closest rounded approximation to the infinite-precision number represented by the string float->string yields the string with the fewest significant digits that, when converted as above, yields exactly the same floating-point value These rules guarantee that 1.1 will always print as 1.1, and also that printing any floating-point value and reading it back in again will give exactly the same results. They do, however, have three disadvantages: 1) They are a pain to implement correctly. 2) There are some pathological cases that take a long time to convert. 3) Results may be different from the local C implementation. (1) can be ameliorated, on many platforms by using David Gay's implementation (www.netlib.org/fp, which is distributed for free under such liberal terms that I find it hard to believe that it wouldn't be compatible with Python). I don't know what to do about (2) or (3).

On Tue, Mar 30, 2004, Andrew Koenig wrote:
I've read the whole thread, and I wanted to repeat a critical point for emphasis: This doesn't help No matter what you do to improve conversion issues, you're still dealing with the underlying floating-point problems, and having watched the changing discussions in c.l.py since we moved to the different conversion system, it seems clear to me that we've improved the nature of the discussion by forcing people to get bitten earlier. Facundo's Decimal module is the only way to improve the current situation. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "usenet imitates usenet" --Darkhawk

[Aahz]
Then it's time to segue into the "how come str(container) applies repr() to the containees?" debate, which usually follows this one, like chaos after puppies <0.9 wink>.
Facundo's Decimal module is the only way to improve the current situation.
The only short-term way to make a big difference, certainly.

On the other hand, it is pragmatically more convenient when an implementation prints the values of floating-point literals with a small number of significant digits with the same number of significant digits with which they were entered. If I can enter a number as 0.1, printing that number as 0.1 does not introduce any errors that were not already there, as proved by the fact that reading that 0.1 back will yield exactly the same value.

On Wed, Mar 31, 2004, Andrew Koenig wrote:
Pragmatically more convenient by what metric? No matter how you slice it, binary floating point contains surprises for the uninitiated. The question is *WHEN* do you hammer the point home? I've yet to see you address this directly.
It's not a matter of introducing errors, it's a matter of making the errors visible. Python is, among other things, a language suitable for introducing people to computers. That's why the Zen of Python contains such gems as Explicit is better than implicit. Errors should never pass silently. In the face of ambiguity, refuse the temptation to guess. If you're going to continue pressing your point, please elucidate your reasoning in terms of Python's design principles. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "usenet imitates usenet" --Darkhawk

Pragmatically more convenient by what metric?
Short output is easier to read than long output.
I haven't, because I'm unconvinced that there is a single right answer. Decimal floating-point has almost all the pitfalls of binary floating-point, yet I do not see anyone arguing against decimal floating-point on the basis that it makes the pitfalls less apparent.
If you're going to continue pressing your point, please elucidate your reasoning in terms of Python's design principles.
Beautiful is better than ugly. Simple is better than complex. Readability counts. When I write programs that print floating-point numbers I usually want to see one of the following: * a rounded representation with n significant digits, where n is significantly less than 17 * a rounded representation with n digits after the decimal point, where n is often 2 * the unbounded-precision exact decimal representation of the number (which always exists, because every binary floating-point number has a finite exact decimal representation) * the most convenient (i.e. shortest) way of representing the number that will yield exactly the same result when read Python gives me none of these, and instead gives me something else entirely that is almost never what I would like to see, given the choice. I understand that I have the option of requesting the first two of these choices explicitly, but I don't think there's a way to make any of them the default. I'm not picking on Python specifically here, as I have similar objections to the floating-point behavior of most other languages aside from Scheme (which is not to my taste for other reasons). However, I do think that this issue is more subtle than one that can be settled by appealing to slogans. In particular, I *do* buy the argument that the current behavior is the best that can be efficiently achieved while relying on the underlying C floating-point conversions. If you're really serious about hammering errors in early, why not have the compiler issue a warning any time a floating-point literal cannot be exactly represented? <0.5 wink>

On Wed, Mar 31, 2004, Andrew Koenig wrote:
Actually, decimal floating point takes care of two of the pitfalls of binary floating point: * binary/decimal conversion * easily modified precision When people are taught decimal arithmetic, they're usually taught the problems with it, so they aren't surprised. (e.g. 1/3) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "usenet imitates usenet" --Darkhawk

Actually, decimal floating point takes care of two of the pitfalls of binary floating point:
* binary/decimal conversion
* easily modified precision
When people are taught decimal arithmetic, they're usually taught the problems with it, so they aren't surprised. (e.g. 1/3)
But doesn't that just push the real problems further into the background, making them more dangerous? <0.1 wink> For example, be it, binary or decimal, floating-point addition is still not associative, so even such a simple computation as a+b+c requires careful thought if you wish the maximum possible precision. Why are you not arguing against decimal floating-point if your goal is to expose users to the problems of floating-point as early as possible?

[Andrew Koenig]
Not really for most everyday applications of decimal arithmetic. People work with decimal quantities in real life, and addition of fp decimals is exact (hence also associative) provided the total precision isn't exceeded. Since Decimal allows setting precision to whatever the user wants, it's very easy to pick a precision obviously so large that even adding a billion (e.g.) dollars-and-cents inputs yields the exact result, and regardless of addition order. For the truly paranoid, Decimal's "inexact flag" can be inspected at the end to see whether the exactness assumption was violated, and the absurdly paranoid can even ask that an exception get raised whenever an inexact result would have been produced. Binary fp loses in these common cases *just because* the true inputs can't be represented, and the number printed at the end isn't even the true result of approximately adding the approximated inputs. Decimal easily avoids all of that.
The overwhelmingly most common newbie binary fp traps today are failures to realize that the numbers they type aren't the numbers they get, and that the numbers they see aren't the results they got. Adding 0.1 to itself 10 times and not getting 1.0 exactly is universally considered to be "a bug" by newbies (but it is exactly 1.0 in decimal). OTOH, if they add 1./3. to itself 3 times under decimal and don't get exactly 1.0, they won't be surprised at all. It's the same principle at work in both cases, but they're already trained to expect 0.9...9 from the latter. The primary newbie difficulty with binary fp is that the simplest use case (just typing in an ordinary number) is already laced with surprises -- it already violates WYSIWYG, and insults a lifetime of "intuition" gained from by-hand and calculator math (of course it's not a coincidence that hand calculators use decimal arithmetic internally -- they need to be user-friendly). You have to do things fancier than *just* typing in the prices of grocery items to get in trouble with Decimal.

[Tim]
[Andrew Koenig]
Well, some of it. It still doesn't avoid 1E50 + 1E-50 == 1E50, for example.
It's not common for newbies to use exponential notation, and neither is it common "for most everyday applications of decimal arithmetic" (which I was talking about, in part of the context that got snipped) to have inputs spanning 100 orders of magnitude. If you know that *your* app has inputs spanning 100 orders of magnitude, and you care about every digit, then set Decimal precision to something exceeding 100 digits, and your sample addition will be exact (and then 1E50 + 1E-50 > 1E50, and exceeds the RHS by exactly 1E-50). That's what the "easily" in "easily avoids" means -- the ability to boost precision is very powerful!
Well, I'm sure that pissing off everyone all the time would be a significant step backwards. BTW, so long as Python relies on C libraries for float<->string conversion, it also has no way to know which floating-point literals can't be exactly represented anyway.

Andrew Koenig <ark-mlist@att.net>:
But they're not the pitfalls at issue here. The pitfalls at issue are the ones due to binary floating point behaving *differently* from decimal floating point. Most people's mental model of arithmetic, including floating point, works in decimal. They can reason about it based on their experience with pocket calculators. They don't have any experience with binary floating point, though, so any additional oddities due to that are truly surprising and mysterious to them. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Andrew Koenig <ark-mlist@att.net>:
Er, you do realise this only happens when the number pops out in the interactive interpreter, or you use repr(), don't you? If you convert it with str(), or print it, you get something much more like what you seem to want. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

"Andrew Koenig" <ark-mlist@att.net> wrote in message news:008401c4173a$82e7fc40$6402a8c0@arkdesktop... If I can enter a number as 0.1, printing that number as 0.1 does not introduce any errors that were not already there, as proved by the fact that reading that 0.1 back will yield exactly the same value. If I enter 1.1000000000000001, I am not sure I would necessarily be happy if str() and repr() both gave the same highly rounded string representation ;-) tjr

Andrew Koenig <ark-mlist@att.net>:
But "significant digits" is a concept that exists only in the mind of the user. How is the implementation to know how many of the digits are significant, or how many digits it was originally entered with? And what about numbers that result from a calculation, and weren't "entered" at all? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

[Andrew Koenig]
[Greg Ewing]
The Decimal module has answers to such questions, following the proposed IBM decimal standard, which in turn follows long-time REXX practice. The representation is not normalized, and because of that is able to keep track of "significant" trailing zeroes. So, e.g., decimal 2.7 - 1.7 yields decimal 1.0 (neither decimal 1. nor decimal 1.00), while decimal 2.75 - 1.65 yields decimal 1.10, and 1.0 and 1.10 have different internal representations than decimal 1 and 1.1, or 1.00 and 1.100. "The rules" are spelled out in detail in the spec: http://www2.hursley.ibm.com/decimal/

[Aahz]
[Ping]
Naturally, I disagree. The immediate motivation at the time was that marshal uses repr(float) to store floats in code objects, so people who use floats seriously found that results differed between running a module directly and importing the same module via a .pyc/.pyo file. That's flatly intolerable for serious work. That could have been repaired by changing the marshal format, at some cost in compatibility headaches. But since we made the change anyway, it had a wonderful consequence: fp newbies gripe about an example very much like the above right away, and we have a tutorial appendix now that gives them crucial education about the issues involved early in their Python career. Before, they were bit by a large variety of subtler fp surprises much later in their Python life, harder to explain, each requiring a different detailed explanation. Since I'm the guy who traditionally tried to help newbies with stuff like that over the last decade, my testimony that life is 10x better after the change shouldn't be dismissed lightly. A display hook was added to sys so that people who give a rip (not naming Ping specifically <wink>) could write and share code to format interactive responses following whatever rules they can tolerate. It's still a surprise to me that virtually nobody seems to have cared enough to bother.

[Aahz]
[Ping]
Naturally, I disagree. The immediate motivation at the time was that marshal uses repr(float) to store floats in code objects, so people who use floats seriously found that results differed between running a module directly and importing the same module via a .pyc/.pyo file. That's flatly intolerable for serious work. That could have been repaired by changing the marshal format, at some cost in compatibility headaches. But since we made the change anyway, it had a wonderful consequence: fp newbies gripe about an example very much like the above right away, and we have a tutorial appendix now that gives them crucial education about the issues involved early in their Python career. Before, they were bit by a large variety of subtler fp surprises much later in their Python life, harder to explain, each requiring a different detailed explanation. Since I'm the guy who traditionally tried to help newbies with stuff like that over the last decade, my testimony that life is 10x better after the change shouldn't be dismissed lightly. A display hook was added to sys so that people who give a rip (not naming Ping specifically <wink>) could write and share code to format interactive responses following whatever rules they can tolerate. It's still a surprise to me that virtually nobody seems to have cared enough to bother.

On Tue, 30 Mar 2004, Tim Peters wrote:
That doesn't make sense to me. If the .py file says "1.1" and the .pyc file says "1.1", you're going to get the same results. In fact, you've just given a stronger reason for keeping "1.1". Currently, compiling a .py file containing "1.1" produces a .pyc file containing "1.1000000000000001". .pyc files are supposed to be platform-independent. If these files are then run on a platform with different floating-point precision, the .py and the .pyc will produce different results.
This is terrible, not wonderful. The purpose of floating-point is to provide an abstraction that does the expected thing in most cases. To throw the IEEE book at beginners only distracts them from the main challenge of learning a new programming language.
That's because custom display isn't the issue here. It's the *default* behaviour that's causing all the trouble. Out of the box, Python should show that numbers evaluate to themselves. -- ?!ng

[Josiah Carlson]
I believe (please correct me if I'm wrong), that Python floats, on all platforms, are IEEE 754 doubles.
I don't know. Python used to run on some Crays that had their own fp format, with 5 bits less precision than an IEEE double but much greater dynamic range. I don't know whether VAX D double format is still in use either (which has 3 bits more precision than IEEE double).
That is, Python uses the 8-byte FP, not the (arguably worthless) 4-bit FP.
I believe all Python platforms use *some* flavor of 8-byte float.
Cross-platform precision is not an issue.
If it is, nobody has griped about it (to my knowledge).

At 2:40 PM -0500 3/30/04, Tim Peters wrote:
Don't be too sure. I've seen the VMS version getting thumped lately--someone may well be using X floats there. (which are 16 byte floats) -- Dan --------------------------------------"it's like this"------------------- Dan Sugalski even samurai dan@sidhe.org have teddy bears and even teddy bears get drunk

Did I miss the issue here? Floating point represetnations are a problem because for some decimal representations coverting the decimal form to binary and then back to decimal does not (necessarily) return the same value. There's a large literature on this problem and known solutions. (See, for example Guy Steele's paper on printing floating point.) On Tue, 30 Mar 2004, Dan Sugalski wrote:

Josiah Carlson:
Python uses the 8-byte FP, not the (arguably worthless) 4-bit FP. ^^^^^
Yes, most people have hardware a little more sophisticated than an Intel 4004 these days. :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Shane Hathaway <shane@zope.com>:
Is that exponent in excess-1 format? :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

[Ping] [Tim]
[Ping]
That doesn't make sense to me. If the .py file says "1.1" and the .pyc file says "1.1", you're going to get the same results.
repr(float) used to round to 12 significant digits (same as str() does ow -- repr(float) and str(float) used to be identical). So the problem was real, and so was the fix.
But you can't get away from that via any decimal rounding rule. One of the *objections* the 754 committee had to the Scheme rule is that moving rounded shortest-possible decimal output to a platform with greater precision could cause the latter platform to read in an unnecessarily poor approximation to the actual number written on the source platform. It's simply a fact that decimal 1.1000000000000001 is a closer approximation to the number stored in an IEEE double (given input "1.1" perfectly rounded to IEEE double format) than decimal 1.1, and that has consequences too when moving to a wider precision. You have in mind *typing* "1.1" literally, so that storing "1.1" would give a better approximation to decimal 1.1 on that box with wider precision, but repr() doesn't know whether its input was typed by hand or computed. Most floats in real life are computed. So if we were to change the marshal format, it would make much more sense to reuse pickle's binary format for floats (which represents floats exactly, at least those that don't exceed the precision or dynamic range of a 754 double). The binary format also *is* portable. Relying on decimal strings (of any form) isn't really, so long as Python relies on the platform C to do string<->float conversion. Slinging shortest-possible output requires perfect rounding on input, which is stronger than the 754 standard requires. Slinging decimal strings rounded to 17 digits is less demanding, and is portable across all boxes whose C string->float meets the 754 standard.
But since we made the change anyway, it had a wonderful consequence: ...
This is terrible, not wonderful. ...
We've been through all this before, so I'm heartened to see that we'll still never agree <wink>.

On Tue, 2004-03-30 at 14:40, Tim Peters wrote:
We've been through all this before, so I'm heartened to see that we'll still never agree <wink>.
Perhaps we need to write a very short PEP that explains why things are the way they are. I enjoy the occasional break from discussions about decorator syntax as much as the next guy, but I'd rather discussion knew controversies like generator expression binding rules than rehash the same old discussions. Jeremy

But if you're moving to a wider precision, surely there is an even better decimal approximation to the IEEE-rounded "1.1" than 1.1000000000000001 (with even more digits), so isn't the preceding paragraph a justification for using that approximation instead?

[Andrew Koenig]
Like Ping, you're picturing typing in "1.1" by hand, so that you *know* decimal 1.1 on-the-nose is the number you "really want". But repr() can't know that -- it's a general principle of 754 semantics for each operation to take the bits it's fed at face value, because the implementation can't guess intent, and it's likely to create more problems than it solves if it tries to "improve" the bits it actually sees. So far as reproducing observed results as closely as possible goes, the wider machine will in fact do better if it sees "1.100000000000001" instead of "1.1", because the former is in fact a closer approximation to the number the narrower machine actually *uses*. Suppose you had a binary float format with 3 bits of precision, and the result of a computation on that box is .001 binary = 1/8 = 0.125 decimal. The "shortest-possible reproducing decimal representation" on that box is 0.1. Is it more accurate to move that result to a wider machine via the string "0.1" or via the string "0.125"? The former is off by 25%, but the latter is exactly right. repr() on the former machine has no way to guess whether the 1/8 it's fed is the result of the user typing in "0.1" or the result of dividing 1.0 by 8.0. By taking the bits at face value, and striving to communicate that as faithfully as possible, it's explainable, predictable, and indeed as faithful as possible. "Looks pretty too" isn't a requirement for serious floating-point work.

Tim Peters wrote:
It seems like most people who write '1.1' don't really want to dive into serious floating-point work. I wonder if literals like 1.1 should generate decimal objects as described by PEP 327, rather than floats. Operations on decimals that have no finite decimal representation (like 1.1 / 3) should use floating-point arithmetic internally, but they should store in decimal rather than floating-point format. Shane

[Shane Hathaway]
Well, one of the points of the Decimal module is that it gives results that "look like" people get from pencil-and-paper math (or hand calculators). So, e.g., I think the newbie traumatized by not getting back "0.1" after typing 0.1 would get just as traumitized if moving to binary fp internally caused 1.1 / 3.3 to look like 0.33333333333333337 instead of 0.33333333333333333 If they stick to Decimal throughout, they will get the latter result (and they'll continue to get a string of 3's for as many digits as they care to ask for). Decimal doesn't suffer string<->float conversion errors, but beyond that it's prone to all the same other sources of error as binary fp. Decimal's saving grace is that the user can boost working precision to well beyond the digits they care about in the end. Kahan always wrote that the best feature of IEEE-754 to ease the lives of the fp-naive is the "double extended" format, and HW support for that is built in to all Pentium chips. Alas, most compilers and languages give no access to it. The only thing Decimal will have against it in the end is runtime sloth.

On Tue, 30 Mar 2004, Tim Peters wrote:
The only thing Decimal will have against it in the end is runtime sloth.
While the Decimal implementation is in Python, certainly. However, I did some floating point calculation timings a while back, and the Python FP system is slow due to the overhead of working out types, unpacking the values, and repacking the value. The actual native calculation is a small portion of that time. My question is: Is it possible that a C implementation of Decimal would be almost as fast as native floating point in Python for reasonable digit lengths and settings? (ie. use native FP as an approximation and then do some tests to get the last digit right). The intent here is to at least propose as a strawman that Python use a C implementation of Decimal as its native floating point type. This is similar to the long int/int unification. Long ints are slow, but things are okay as long as the numbers are within the native range. The hope would be that Decimal configurations which fit within the machine format are reasonably fast, but things outside it slow down. Please note that nowhere did I comment that creating such a C implementation of Decimal would be easy or even possible. ;) -a

On Tue, Mar 30, 2004, Andrew P. Lentvorski, Jr. wrote:
Basic answer: yes, for people not doing serious number crunching
Well, that won't happen. The long/int issue at least has compatibility at the binary level; binary/decimal conversions lead us right back to the problems that Decimal is trying to fix.
Please note that nowhere did I comment that creating such a C implementation of Decimal would be easy or even possible. ;)
Actually, the whole point of the Decimal class is that it's easy to implement. Once we agree on the API and semantics, converting to C should be not much harder than trivial. Although I ended up dropping the ball, that's the whole reason I got involved with Decimal in the first place: the intention is that Decimal written in C will release the GIL. It will be an experiment in computational threading. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "usenet imitates usenet" --Darkhawk

"Andrew P. Lentvorski, Jr." <bsder@allcaps.org>:
That sounds like an extremely tricky thing to do, and it's not immediately obvious that it's even possible. But maybe it would still be "fast enough" doing it all properly in decimal? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

On Tue, 30 Mar 2004, Tim Peters wrote:
All right. Maybe we can make some progress. I agree that round-to-12 was a real problem. But i think we are talking about two different use cases: compiling to disk and displaying on screen. I think we can satisfy both desires. If i understand you right, your primary aim is to make sure the marshalled form of any floating-point number yields the closest possible binary approximation to the machine value on the original platform, even when that representation is used on a different platform. (Is that correct? Perhaps it's best if you clarify -- exactly what is the invariant you want to maintain, and what changes [in platform or otherwise] do you want the representation to withstand?) That doesn't have to be part of repr()'s contract. (In fact, i would argue that already repr() makes no such promise.) repr() is about providing a representation for humans. Can we agree on maximal precision for marshalling, and shortest- accurate precision for repr, so we can both be happy? (By shortest-accurate i mean "the shortest representation that converts to the same machine number". I believe this is exactly what Andrew described as Scheme's method. If you are very concerned about this being a complex and/or slow operation, a fine compromise would be a "try-12" algorithm: if %.12g is accurate then use it, and otherwise use %.17g. This is simple, easy to implement, produces reasonable results in most cases, and has a small bound on CPU cost.) def try_12(x): rep = '%.12g' % x if float(rep) == x: return rep return '%.17g' % x def shortest_accurate(x): for places in range(17): fmt = '%.' + str(places) + 'g' rep = fmt % x if float(rep) == x: return rep return '%.17g' % x -- ?!ng

[Ping]
All right. Maybe we can make some progress.
Probably not -- we have indeed been thru all of this before.
Those are two of the use cases, yes.
I want marshaling of fp numbers to give exact (not approximate) round-trip equality on a single box, and across all boxes supporting the 754 standard where C maps "double" to a 754 double. I also want marshaling to preserve as much accuracy as possible across boxes with different fp formats, although that may not be practical. Strings have nothing to do with that, except for the historical accident that marshal happens to use decimal strings. Changing repr(float) to produce 17 digits went a very long way toward achieving all that at the time, with minimal code changes. The consequences of that change I *really* like didn't become apparent for years.
As above. Beyond exact equality across suitable 754 boxes, we'd have to agree on a parameterized model of fp, and explain "as much accuracy as possible" in terms of that. But you don't care, so I won't bother <wink>.
That doesn't have to be part of repr()'s contract. (In fact, i would argue that already repr() makes no such promise.)
It doesn't, but the docs do say: If at all possible, this [repr's result] should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). This is possible for repr(float), and is currently true for repr(float) (on 754-conforming boxes).
repr() is about providing a representation for humans.
I think the docs are quite clear that this function belongs to str(): .... the ``informal'' string representation of an object. This differs from __repr__() in that it does not have to be a valid Python expression: a more convenient or concise representation may be used instead. The return value must be a string object.
Can we agree on maximal precision for marshalling,
I don't want to use strings at all for marshalling. So long as we do, 17 is already correct for that purpose (< 17 doesn't preserve equality, > 17 can't be relied on across 754-conforming C libraries).
and shortest-accurate precision for repr, so we can both be happy?
As I said before (again and again and again <wink>), I'm the one who has fielded most newbie questions about fp since Python's beginning, and I'm very happy with the results of changing repr() to produce 17 digits. They get a little shock at the start now, but potentially save themselves from catastrophe by being forced to grow some *necessary* caution about fp results early. So, no, we're not going to agree on this. My answer for newbies who don't know and don't care (and who are determined never to know or care) has always been to move to a Decimal module. That's less surprising than binary fp in several ways, and 2.4 will have it.
Yes, except that Scheme also requires that this string be correctly rounded to however many digits are produced. A string s just satisfying eval(s) == some_float needn't necessarily be correctly rounded to s's precision.
Also unique to Python.
Our disagreement is more fundamental than that. Then again, it always has been <smile!>.

On Tue, 30 Mar 2004, Tim Peters wrote:
That is a valuable property. I support it and support Python continuing to have that property. I hope it has been made quite clear by now that this property does not constrain how numbers are displayed by the interpreter in human-readable form. The issue of choosing an appropriate string representation of a number is unaffected by the desire for the above property.
I think we *have* made progress. Now we can set aside the red-herring issue of platform-independent serialization and focus on the real issue: human-readable string representation. So let's look at what you said about Python's accessibility:
Now you are pulling rank. I cannot dispute your longer history and greater experience with Python; it is something i greatly admire and respect. I also don't know your personal experiences teaching Python. But i can tell you my experiences. And i can tell you that i have tried to teach Python to many people, individually and in groups. I taught a class in Python at UC Berkeley last spring to 22 people who had never used Python before. I maintained good communication with the students and their feedback was very positive about the class. How did the class react to floating-point? Seeing behaviour like this: >>> 3.3 3.2999999999999998 >>> confused and frightened them, and continues to confuse and frighten almost everyone i teach. (The rare exceptions are the people who have done lots of computational work before and know how binary floating-point representations work.) Every time this happens, the teaching is derailed and i am forced to go into an explanation of binary floating-point to assuage their fears. Remember, i am trying to teach basic programming skills. How to solve problems; how to break down problems into steps; what's a subroutine; and so on. Aside from this floating-point thing throwing them off, Python is a great first language for new programmers. This is not the time to talk about internal number representation. I am tired of making excuses for Python. I love to tell people about Python and show them what it can do for them. But this floating-point problem is embarrassing. People are confused because no other system they've seen behaves like this. Other languages don't print their numbers like this. Accounting programs and spreadsheets don't print their numbers like this. Matlab and Maple and Mathematica don't print their numbers like this. Only Python insists on being this ugly. And it screws up the most common way that people first get to know Python -- as a handy interactive calculator. And for what? For no gain at all -- because when you limit your focus to the display issue, the only argument you're making is "People should be frightened." That's a pointless reason. Everything in Python -- everything in computers, in fact -- is a *model*. We don't expect the model to be perfectly accurate or to be completely free of limitations. IEEE 754 happens to be the prevalent model for the real number line. We don't print every string with a message after it saying "WARNING: MAXIMUM LENGTH 4294967296", and we shouldn't do the same for floats. "3.2999999999999998" does not give you any more information than "3.3". They both represent exactly the same value. "3.3" is vastly easier to read. The only reason you seem to want to display "3.2999999999999998" is to frighten people. So why not display "3.3 DANGER DANGER!!"? Even that would be much easier to read, but my point is that i hope it exposes the problem with your argument. I'm asking you to be more realistic here. Not everyone runs into floating-point corner cases. In fact, very few people do. I have never encountered such a problem in my entire history of using Python. And if you surveyed the user community, i'm sure you would find that only a small minority cares enough about the 17th decimal place for the discrepancy to be an issue. Now, why should that minority make it everyone else's problem? Is this CP4E or CPFPFWBFP? Computer Programming For Everybody? Or Computer Programming For People Familiar With Binary Floating-Point? You say it's better for people to get "bitten early". What's better: everyone suffering for a problem that will never affect most of them, or just those who care about the issue having to deal with it? Beautiful is better than ugly. Practicality beats purity. Readability counts. -- ?!ng

On 2004-04-06, at 14.46, Ka-Ping Yee wrote:
So how should "2.2 - 1.2 - 1" be represented? Matlab (Solaris 9): 2.22044604925031e-16 Octave (MacOS X 10.3): 2.22044604925031e-16 Python 2.3.3 (MacOS X 10.3): 2.2204460492503131e-16 Is this something you accept since Matlab does it?

On Tue, 6 Apr 2004, Simon Percivall wrote:
I accept it, but not primarily because Matlab does it. It is fine to show a value different from zero in this case because the result really is different from zero. In this case those digits are necessary to represent the machine number. In other cases (such as 1.1) they are not. -- ?!ng

On Tue, Apr 06, 2004, Ka-Ping Yee wrote:
The point is that anyone who relies on floating point will almost certainly end up with the equivalent of Simon's case at some point. The most common *really* ugly case that's hard for new programmers to deal with is: section_size = length / num_sections tmp = 0 while tmp < length: process(section) tmp += section_size and variations thereof. Just because every other programming language hides this essential difficulty is no reason to follow along. I don't have Tim's experience, but until we added that section to the tutorial, I was one of the regular "first responders" on c.l.py who was always dealing with this issue. I stand with Uncle Timmy -- it's been much easier now. If you want to teach Python to complete newcomers, you need to do one of two things: * Avoid floating point * Start by explaining what's going on (use Simon's example above for emphasis) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups?

This is an argument for making some form of decimal floating point the default in Python, with binary FP an option for advanced users -- not an argument for trying to hide the fact that binary FP is being used. Because you *can't* hide that fact completely, and as has been pointed out, it *will* rear up and bite these people eventually. It's much better if that happens early, while there is someone on hand who understands the issues and can guide them gently through the shock-and-awe phase. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

[Tim]
[Ping]
That is a valuable property. I support it and support Python continuing to have that property.
That's good, since nobody opposes it <wink>.
...
I don't think that's "the" real issue, but it is one of several. ...
Now you are pulling rank.
I'm relating my experience, which informs my beliefs about these issues more than any "head argument".
In person, mostly to hardware geeks and other hardcore software geeks. On mailing lists and newsgroups, to all comers, although I've had decreasing time for that as the years drag on.
Sorry, but so long as they stick to binary fp, stuff like that can't be avoided, even using "your rule" (other examples of that were posted today, and I won't repeat them again here). I liked Python's *former* rule myself(repr rounds to 12 significant digits), and would like it much better than "shortest possible" (which still shows tons of crap I usually don't care about) most days for my own uses. That's a real problem Python hasn't addressed: its format decisions are often inappropriate and/or undesirable (both can differ by app and by audience and by object type), and there are insufficient hooks for overriding these decisions. sys.displayhook goes a bit in that direction, but not far enough. BTW, if your students *remain* confused & frightened, it could be you're not really trying to explain binary fp reality to them.
Then they *don't* remain confused & frightened? Great. Then they've been educated. How long can it take to read the Tutorial Appendix? It's well worth however many years it takes <wink>.
Use Decimal instead. That's always been the best idea for newbies (and for most casual users of floating-point, newbie or not).
If you're teaching "basic programming skills", what other systems have they seen? Hand calculators for sure -- which is why they should use Decimal instead. Virually nothing about it will surprise them, except the liberating ability to crank up the precision.
Other languages don't print their numbers like this. Accounting programs and spreadsheets don't print their numbers like this.
I don't care -- really. I'm thoroughly in agreement with Kahan on this; see, e.g., section "QPRO 4.0 and QPRO for Windows" in http://www.cs.berkeley.edu/~wkahan/MktgMath.pdf ... the reader can too easily misinterpret a few references to 15 or 16 sig. dec of precision as indications that no more need be said about QPRO's arithmetic. Actually much more needs to be said because some of it is bizarre. Decimal displays of Binary nonintegers cannot always be WYSIWYG. Trying to pretend otherwise afflicts both customers and implementors with bugs that go mostly misdiagnosed, so fixing one bug merely spawns others. ... The correct cure for the @ROUND and @INT (and some other) bugs is not to fudge their argument but to increase from 15 to 17 the maximum number of sig. dec. that users of QPRO may see displayed. But no such cure can be liberated from little annoyances: [snip things that make Ping's skin crawl about Python today] ... For Quattros intended market, mostly small businesses with little numerical expertise, a mathematically competent marketing follow- through would have chosen either to educate customers about binary floating-point or, more likely, to adopt decimal floating-point arithmetic even if it runs benchmarks slower. The same cures are appropriate for Python.
Matlab and Maple and Mathematica don't print their numbers like this.
Those are designed for experts (although Mathematica pretends not to be).
Sorry, that's an absurd recharacterization, and I won't bother responding to it. If you really can't see any more to "my side" of the argument than that yet, then repeating it another time isn't going to help. So enough of this. In what time I can make for "stuff like this", I'm going to try to help the Decimal module along instead. Do what you want with interactive display of non-decimal floats, but do try to make it flexible instead of fighting tooth and nail just to replace one often-hated fixed behavior with another to-be-often-hated fixed behavior. ...
Not everyone runs into floating-point corner cases. In fact, very few people do.
Heh. I like to think that part of that has to do with the change to repr()! As I've said many times before, we *used* to get reports of a great variety of relatively *subtle* problems due to binary fp behavior from newbies; we generally get only one now, and the same one every time. They're not stupid, Ping, they just need the bit of education it takes to learn something about that expensive fp hardware they bought.
I have never encountered such a problem in my entire history of using Python.
Pride goeth before the fall ...
The result of int() can change by 1 when the last bit changes, and the difference between 2 and 3 can be a disaster -- see Kahan (op. cit.) for a tale of compounded woe following from that one. Aahz's recent example of a loop going around one time more or less "than expected" used to be very common, and is the same thing in a different guise. It's like security that way: nobody gives a shit before they get burned, and then they get livid about it. If a user believes 0.1 is one tenth, they're going to get burned by it.
The force of this is lost because you don't have a way to spare users from "unexpected extra digits" either. It comes with the territory! It's inherit in using binary fp in a decimal world. All you're really going on about is showing "funny extra digits" less often -- which will make them all the more mysterious when they show up. I liked the former round-to-12-digits behavior much better on that count. I expect to like Decimal mounds better on all counts except speed.

On Tue, Mar 30, 2004, Edward Loper wrote:
On Tue, 30 Mar 2004, Aahz wrote:
I regret that this "feature" was ever introduced or "fixed" or what have you. Things were much better when repr(1.1) was "1.1" a few versions ago. This inconsistency is strange and surprising to every Python learner and I still believe there is no good reason for it. The motivation, as i remember it, was to make repr(x) produce a cross-platform representation of x. But no one uses repr() expecting bit-for-bit identity across platforms. repr() can't even represent most objects; if you want to transfer things between platforms, you would use pickle. If typing in "1.1" produces x, then "1.1" is a perfectly accurate representation of x on the current platform. And that is sufficient. Showing "1.1000000000000001" is a clear case of confusing lots of people in exchange for an obscure benefit to very few. If i could vote for just one thing to roll back about Python, this would be it. -- ?!ng

I wish that Python would use the same conversion rules as Scheme: string->float yields the closest rounded approximation to the infinite-precision number represented by the string float->string yields the string with the fewest significant digits that, when converted as above, yields exactly the same floating-point value These rules guarantee that 1.1 will always print as 1.1, and also that printing any floating-point value and reading it back in again will give exactly the same results. They do, however, have three disadvantages: 1) They are a pain to implement correctly. 2) There are some pathological cases that take a long time to convert. 3) Results may be different from the local C implementation. (1) can be ameliorated, on many platforms by using David Gay's implementation (www.netlib.org/fp, which is distributed for free under such liberal terms that I find it hard to believe that it wouldn't be compatible with Python). I don't know what to do about (2) or (3).

On Tue, Mar 30, 2004, Andrew Koenig wrote:
I've read the whole thread, and I wanted to repeat a critical point for emphasis: This doesn't help No matter what you do to improve conversion issues, you're still dealing with the underlying floating-point problems, and having watched the changing discussions in c.l.py since we moved to the different conversion system, it seems clear to me that we've improved the nature of the discussion by forcing people to get bitten earlier. Facundo's Decimal module is the only way to improve the current situation. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "usenet imitates usenet" --Darkhawk

[Aahz]
Then it's time to segue into the "how come str(container) applies repr() to the containees?" debate, which usually follows this one, like chaos after puppies <0.9 wink>.
Facundo's Decimal module is the only way to improve the current situation.
The only short-term way to make a big difference, certainly.

On the other hand, it is pragmatically more convenient when an implementation prints the values of floating-point literals with a small number of significant digits with the same number of significant digits with which they were entered. If I can enter a number as 0.1, printing that number as 0.1 does not introduce any errors that were not already there, as proved by the fact that reading that 0.1 back will yield exactly the same value.

On Wed, Mar 31, 2004, Andrew Koenig wrote:
Pragmatically more convenient by what metric? No matter how you slice it, binary floating point contains surprises for the uninitiated. The question is *WHEN* do you hammer the point home? I've yet to see you address this directly.
It's not a matter of introducing errors, it's a matter of making the errors visible. Python is, among other things, a language suitable for introducing people to computers. That's why the Zen of Python contains such gems as Explicit is better than implicit. Errors should never pass silently. In the face of ambiguity, refuse the temptation to guess. If you're going to continue pressing your point, please elucidate your reasoning in terms of Python's design principles. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "usenet imitates usenet" --Darkhawk

Pragmatically more convenient by what metric?
Short output is easier to read than long output.
I haven't, because I'm unconvinced that there is a single right answer. Decimal floating-point has almost all the pitfalls of binary floating-point, yet I do not see anyone arguing against decimal floating-point on the basis that it makes the pitfalls less apparent.
If you're going to continue pressing your point, please elucidate your reasoning in terms of Python's design principles.
Beautiful is better than ugly. Simple is better than complex. Readability counts. When I write programs that print floating-point numbers I usually want to see one of the following: * a rounded representation with n significant digits, where n is significantly less than 17 * a rounded representation with n digits after the decimal point, where n is often 2 * the unbounded-precision exact decimal representation of the number (which always exists, because every binary floating-point number has a finite exact decimal representation) * the most convenient (i.e. shortest) way of representing the number that will yield exactly the same result when read Python gives me none of these, and instead gives me something else entirely that is almost never what I would like to see, given the choice. I understand that I have the option of requesting the first two of these choices explicitly, but I don't think there's a way to make any of them the default. I'm not picking on Python specifically here, as I have similar objections to the floating-point behavior of most other languages aside from Scheme (which is not to my taste for other reasons). However, I do think that this issue is more subtle than one that can be settled by appealing to slogans. In particular, I *do* buy the argument that the current behavior is the best that can be efficiently achieved while relying on the underlying C floating-point conversions. If you're really serious about hammering errors in early, why not have the compiler issue a warning any time a floating-point literal cannot be exactly represented? <0.5 wink>

On Wed, Mar 31, 2004, Andrew Koenig wrote:
Actually, decimal floating point takes care of two of the pitfalls of binary floating point: * binary/decimal conversion * easily modified precision When people are taught decimal arithmetic, they're usually taught the problems with it, so they aren't surprised. (e.g. 1/3) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "usenet imitates usenet" --Darkhawk

Actually, decimal floating point takes care of two of the pitfalls of binary floating point:
* binary/decimal conversion
* easily modified precision
When people are taught decimal arithmetic, they're usually taught the problems with it, so they aren't surprised. (e.g. 1/3)
But doesn't that just push the real problems further into the background, making them more dangerous? <0.1 wink> For example, be it, binary or decimal, floating-point addition is still not associative, so even such a simple computation as a+b+c requires careful thought if you wish the maximum possible precision. Why are you not arguing against decimal floating-point if your goal is to expose users to the problems of floating-point as early as possible?

[Andrew Koenig]
Not really for most everyday applications of decimal arithmetic. People work with decimal quantities in real life, and addition of fp decimals is exact (hence also associative) provided the total precision isn't exceeded. Since Decimal allows setting precision to whatever the user wants, it's very easy to pick a precision obviously so large that even adding a billion (e.g.) dollars-and-cents inputs yields the exact result, and regardless of addition order. For the truly paranoid, Decimal's "inexact flag" can be inspected at the end to see whether the exactness assumption was violated, and the absurdly paranoid can even ask that an exception get raised whenever an inexact result would have been produced. Binary fp loses in these common cases *just because* the true inputs can't be represented, and the number printed at the end isn't even the true result of approximately adding the approximated inputs. Decimal easily avoids all of that.
The overwhelmingly most common newbie binary fp traps today are failures to realize that the numbers they type aren't the numbers they get, and that the numbers they see aren't the results they got. Adding 0.1 to itself 10 times and not getting 1.0 exactly is universally considered to be "a bug" by newbies (but it is exactly 1.0 in decimal). OTOH, if they add 1./3. to itself 3 times under decimal and don't get exactly 1.0, they won't be surprised at all. It's the same principle at work in both cases, but they're already trained to expect 0.9...9 from the latter. The primary newbie difficulty with binary fp is that the simplest use case (just typing in an ordinary number) is already laced with surprises -- it already violates WYSIWYG, and insults a lifetime of "intuition" gained from by-hand and calculator math (of course it's not a coincidence that hand calculators use decimal arithmetic internally -- they need to be user-friendly). You have to do things fancier than *just* typing in the prices of grocery items to get in trouble with Decimal.

[Tim]
[Andrew Koenig]
Well, some of it. It still doesn't avoid 1E50 + 1E-50 == 1E50, for example.
It's not common for newbies to use exponential notation, and neither is it common "for most everyday applications of decimal arithmetic" (which I was talking about, in part of the context that got snipped) to have inputs spanning 100 orders of magnitude. If you know that *your* app has inputs spanning 100 orders of magnitude, and you care about every digit, then set Decimal precision to something exceeding 100 digits, and your sample addition will be exact (and then 1E50 + 1E-50 > 1E50, and exceeds the RHS by exactly 1E-50). That's what the "easily" in "easily avoids" means -- the ability to boost precision is very powerful!
Well, I'm sure that pissing off everyone all the time would be a significant step backwards. BTW, so long as Python relies on C libraries for float<->string conversion, it also has no way to know which floating-point literals can't be exactly represented anyway.

Andrew Koenig <ark-mlist@att.net>:
But they're not the pitfalls at issue here. The pitfalls at issue are the ones due to binary floating point behaving *differently* from decimal floating point. Most people's mental model of arithmetic, including floating point, works in decimal. They can reason about it based on their experience with pocket calculators. They don't have any experience with binary floating point, though, so any additional oddities due to that are truly surprising and mysterious to them. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Andrew Koenig <ark-mlist@att.net>:
Er, you do realise this only happens when the number pops out in the interactive interpreter, or you use repr(), don't you? If you convert it with str(), or print it, you get something much more like what you seem to want. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

"Andrew Koenig" <ark-mlist@att.net> wrote in message news:008401c4173a$82e7fc40$6402a8c0@arkdesktop... If I can enter a number as 0.1, printing that number as 0.1 does not introduce any errors that were not already there, as proved by the fact that reading that 0.1 back will yield exactly the same value. If I enter 1.1000000000000001, I am not sure I would necessarily be happy if str() and repr() both gave the same highly rounded string representation ;-) tjr

Andrew Koenig <ark-mlist@att.net>:
But "significant digits" is a concept that exists only in the mind of the user. How is the implementation to know how many of the digits are significant, or how many digits it was originally entered with? And what about numbers that result from a calculation, and weren't "entered" at all? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

[Andrew Koenig]
[Greg Ewing]
The Decimal module has answers to such questions, following the proposed IBM decimal standard, which in turn follows long-time REXX practice. The representation is not normalized, and because of that is able to keep track of "significant" trailing zeroes. So, e.g., decimal 2.7 - 1.7 yields decimal 1.0 (neither decimal 1. nor decimal 1.00), while decimal 2.75 - 1.65 yields decimal 1.10, and 1.0 and 1.10 have different internal representations than decimal 1 and 1.1, or 1.00 and 1.100. "The rules" are spelled out in detail in the spec: http://www2.hursley.ibm.com/decimal/

[Aahz]
[Ping]
Naturally, I disagree. The immediate motivation at the time was that marshal uses repr(float) to store floats in code objects, so people who use floats seriously found that results differed between running a module directly and importing the same module via a .pyc/.pyo file. That's flatly intolerable for serious work. That could have been repaired by changing the marshal format, at some cost in compatibility headaches. But since we made the change anyway, it had a wonderful consequence: fp newbies gripe about an example very much like the above right away, and we have a tutorial appendix now that gives them crucial education about the issues involved early in their Python career. Before, they were bit by a large variety of subtler fp surprises much later in their Python life, harder to explain, each requiring a different detailed explanation. Since I'm the guy who traditionally tried to help newbies with stuff like that over the last decade, my testimony that life is 10x better after the change shouldn't be dismissed lightly. A display hook was added to sys so that people who give a rip (not naming Ping specifically <wink>) could write and share code to format interactive responses following whatever rules they can tolerate. It's still a surprise to me that virtually nobody seems to have cared enough to bother.

[Aahz]
[Ping]
Naturally, I disagree. The immediate motivation at the time was that marshal uses repr(float) to store floats in code objects, so people who use floats seriously found that results differed between running a module directly and importing the same module via a .pyc/.pyo file. That's flatly intolerable for serious work. That could have been repaired by changing the marshal format, at some cost in compatibility headaches. But since we made the change anyway, it had a wonderful consequence: fp newbies gripe about an example very much like the above right away, and we have a tutorial appendix now that gives them crucial education about the issues involved early in their Python career. Before, they were bit by a large variety of subtler fp surprises much later in their Python life, harder to explain, each requiring a different detailed explanation. Since I'm the guy who traditionally tried to help newbies with stuff like that over the last decade, my testimony that life is 10x better after the change shouldn't be dismissed lightly. A display hook was added to sys so that people who give a rip (not naming Ping specifically <wink>) could write and share code to format interactive responses following whatever rules they can tolerate. It's still a surprise to me that virtually nobody seems to have cared enough to bother.

On Tue, 30 Mar 2004, Tim Peters wrote:
That doesn't make sense to me. If the .py file says "1.1" and the .pyc file says "1.1", you're going to get the same results. In fact, you've just given a stronger reason for keeping "1.1". Currently, compiling a .py file containing "1.1" produces a .pyc file containing "1.1000000000000001". .pyc files are supposed to be platform-independent. If these files are then run on a platform with different floating-point precision, the .py and the .pyc will produce different results.
This is terrible, not wonderful. The purpose of floating-point is to provide an abstraction that does the expected thing in most cases. To throw the IEEE book at beginners only distracts them from the main challenge of learning a new programming language.
That's because custom display isn't the issue here. It's the *default* behaviour that's causing all the trouble. Out of the box, Python should show that numbers evaluate to themselves. -- ?!ng

[Josiah Carlson]
I believe (please correct me if I'm wrong), that Python floats, on all platforms, are IEEE 754 doubles.
I don't know. Python used to run on some Crays that had their own fp format, with 5 bits less precision than an IEEE double but much greater dynamic range. I don't know whether VAX D double format is still in use either (which has 3 bits more precision than IEEE double).
That is, Python uses the 8-byte FP, not the (arguably worthless) 4-bit FP.
I believe all Python platforms use *some* flavor of 8-byte float.
Cross-platform precision is not an issue.
If it is, nobody has griped about it (to my knowledge).

At 2:40 PM -0500 3/30/04, Tim Peters wrote:
Don't be too sure. I've seen the VMS version getting thumped lately--someone may well be using X floats there. (which are 16 byte floats) -- Dan --------------------------------------"it's like this"------------------- Dan Sugalski even samurai dan@sidhe.org have teddy bears and even teddy bears get drunk

Did I miss the issue here? Floating point represetnations are a problem because for some decimal representations coverting the decimal form to binary and then back to decimal does not (necessarily) return the same value. There's a large literature on this problem and known solutions. (See, for example Guy Steele's paper on printing floating point.) On Tue, 30 Mar 2004, Dan Sugalski wrote:

Josiah Carlson:
Python uses the 8-byte FP, not the (arguably worthless) 4-bit FP. ^^^^^
Yes, most people have hardware a little more sophisticated than an Intel 4004 these days. :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Shane Hathaway <shane@zope.com>:
Is that exponent in excess-1 format? :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

[Ping] [Tim]
[Ping]
That doesn't make sense to me. If the .py file says "1.1" and the .pyc file says "1.1", you're going to get the same results.
repr(float) used to round to 12 significant digits (same as str() does ow -- repr(float) and str(float) used to be identical). So the problem was real, and so was the fix.
But you can't get away from that via any decimal rounding rule. One of the *objections* the 754 committee had to the Scheme rule is that moving rounded shortest-possible decimal output to a platform with greater precision could cause the latter platform to read in an unnecessarily poor approximation to the actual number written on the source platform. It's simply a fact that decimal 1.1000000000000001 is a closer approximation to the number stored in an IEEE double (given input "1.1" perfectly rounded to IEEE double format) than decimal 1.1, and that has consequences too when moving to a wider precision. You have in mind *typing* "1.1" literally, so that storing "1.1" would give a better approximation to decimal 1.1 on that box with wider precision, but repr() doesn't know whether its input was typed by hand or computed. Most floats in real life are computed. So if we were to change the marshal format, it would make much more sense to reuse pickle's binary format for floats (which represents floats exactly, at least those that don't exceed the precision or dynamic range of a 754 double). The binary format also *is* portable. Relying on decimal strings (of any form) isn't really, so long as Python relies on the platform C to do string<->float conversion. Slinging shortest-possible output requires perfect rounding on input, which is stronger than the 754 standard requires. Slinging decimal strings rounded to 17 digits is less demanding, and is portable across all boxes whose C string->float meets the 754 standard.
But since we made the change anyway, it had a wonderful consequence: ...
This is terrible, not wonderful. ...
We've been through all this before, so I'm heartened to see that we'll still never agree <wink>.

On Tue, 2004-03-30 at 14:40, Tim Peters wrote:
We've been through all this before, so I'm heartened to see that we'll still never agree <wink>.
Perhaps we need to write a very short PEP that explains why things are the way they are. I enjoy the occasional break from discussions about decorator syntax as much as the next guy, but I'd rather discussion knew controversies like generator expression binding rules than rehash the same old discussions. Jeremy

But if you're moving to a wider precision, surely there is an even better decimal approximation to the IEEE-rounded "1.1" than 1.1000000000000001 (with even more digits), so isn't the preceding paragraph a justification for using that approximation instead?

[Andrew Koenig]
Like Ping, you're picturing typing in "1.1" by hand, so that you *know* decimal 1.1 on-the-nose is the number you "really want". But repr() can't know that -- it's a general principle of 754 semantics for each operation to take the bits it's fed at face value, because the implementation can't guess intent, and it's likely to create more problems than it solves if it tries to "improve" the bits it actually sees. So far as reproducing observed results as closely as possible goes, the wider machine will in fact do better if it sees "1.100000000000001" instead of "1.1", because the former is in fact a closer approximation to the number the narrower machine actually *uses*. Suppose you had a binary float format with 3 bits of precision, and the result of a computation on that box is .001 binary = 1/8 = 0.125 decimal. The "shortest-possible reproducing decimal representation" on that box is 0.1. Is it more accurate to move that result to a wider machine via the string "0.1" or via the string "0.125"? The former is off by 25%, but the latter is exactly right. repr() on the former machine has no way to guess whether the 1/8 it's fed is the result of the user typing in "0.1" or the result of dividing 1.0 by 8.0. By taking the bits at face value, and striving to communicate that as faithfully as possible, it's explainable, predictable, and indeed as faithful as possible. "Looks pretty too" isn't a requirement for serious floating-point work.

Tim Peters wrote:
It seems like most people who write '1.1' don't really want to dive into serious floating-point work. I wonder if literals like 1.1 should generate decimal objects as described by PEP 327, rather than floats. Operations on decimals that have no finite decimal representation (like 1.1 / 3) should use floating-point arithmetic internally, but they should store in decimal rather than floating-point format. Shane

[Shane Hathaway]
Well, one of the points of the Decimal module is that it gives results that "look like" people get from pencil-and-paper math (or hand calculators). So, e.g., I think the newbie traumatized by not getting back "0.1" after typing 0.1 would get just as traumitized if moving to binary fp internally caused 1.1 / 3.3 to look like 0.33333333333333337 instead of 0.33333333333333333 If they stick to Decimal throughout, they will get the latter result (and they'll continue to get a string of 3's for as many digits as they care to ask for). Decimal doesn't suffer string<->float conversion errors, but beyond that it's prone to all the same other sources of error as binary fp. Decimal's saving grace is that the user can boost working precision to well beyond the digits they care about in the end. Kahan always wrote that the best feature of IEEE-754 to ease the lives of the fp-naive is the "double extended" format, and HW support for that is built in to all Pentium chips. Alas, most compilers and languages give no access to it. The only thing Decimal will have against it in the end is runtime sloth.

On Tue, 30 Mar 2004, Tim Peters wrote:
The only thing Decimal will have against it in the end is runtime sloth.
While the Decimal implementation is in Python, certainly. However, I did some floating point calculation timings a while back, and the Python FP system is slow due to the overhead of working out types, unpacking the values, and repacking the value. The actual native calculation is a small portion of that time. My question is: Is it possible that a C implementation of Decimal would be almost as fast as native floating point in Python for reasonable digit lengths and settings? (ie. use native FP as an approximation and then do some tests to get the last digit right). The intent here is to at least propose as a strawman that Python use a C implementation of Decimal as its native floating point type. This is similar to the long int/int unification. Long ints are slow, but things are okay as long as the numbers are within the native range. The hope would be that Decimal configurations which fit within the machine format are reasonably fast, but things outside it slow down. Please note that nowhere did I comment that creating such a C implementation of Decimal would be easy or even possible. ;) -a

On Tue, Mar 30, 2004, Andrew P. Lentvorski, Jr. wrote:
Basic answer: yes, for people not doing serious number crunching
Well, that won't happen. The long/int issue at least has compatibility at the binary level; binary/decimal conversions lead us right back to the problems that Decimal is trying to fix.
Please note that nowhere did I comment that creating such a C implementation of Decimal would be easy or even possible. ;)
Actually, the whole point of the Decimal class is that it's easy to implement. Once we agree on the API and semantics, converting to C should be not much harder than trivial. Although I ended up dropping the ball, that's the whole reason I got involved with Decimal in the first place: the intention is that Decimal written in C will release the GIL. It will be an experiment in computational threading. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "usenet imitates usenet" --Darkhawk

"Andrew P. Lentvorski, Jr." <bsder@allcaps.org>:
That sounds like an extremely tricky thing to do, and it's not immediately obvious that it's even possible. But maybe it would still be "fast enough" doing it all properly in decimal? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

On Tue, 30 Mar 2004, Tim Peters wrote:
All right. Maybe we can make some progress. I agree that round-to-12 was a real problem. But i think we are talking about two different use cases: compiling to disk and displaying on screen. I think we can satisfy both desires. If i understand you right, your primary aim is to make sure the marshalled form of any floating-point number yields the closest possible binary approximation to the machine value on the original platform, even when that representation is used on a different platform. (Is that correct? Perhaps it's best if you clarify -- exactly what is the invariant you want to maintain, and what changes [in platform or otherwise] do you want the representation to withstand?) That doesn't have to be part of repr()'s contract. (In fact, i would argue that already repr() makes no such promise.) repr() is about providing a representation for humans. Can we agree on maximal precision for marshalling, and shortest- accurate precision for repr, so we can both be happy? (By shortest-accurate i mean "the shortest representation that converts to the same machine number". I believe this is exactly what Andrew described as Scheme's method. If you are very concerned about this being a complex and/or slow operation, a fine compromise would be a "try-12" algorithm: if %.12g is accurate then use it, and otherwise use %.17g. This is simple, easy to implement, produces reasonable results in most cases, and has a small bound on CPU cost.) def try_12(x): rep = '%.12g' % x if float(rep) == x: return rep return '%.17g' % x def shortest_accurate(x): for places in range(17): fmt = '%.' + str(places) + 'g' rep = fmt % x if float(rep) == x: return rep return '%.17g' % x -- ?!ng

[Ping]
All right. Maybe we can make some progress.
Probably not -- we have indeed been thru all of this before.
Those are two of the use cases, yes.
I want marshaling of fp numbers to give exact (not approximate) round-trip equality on a single box, and across all boxes supporting the 754 standard where C maps "double" to a 754 double. I also want marshaling to preserve as much accuracy as possible across boxes with different fp formats, although that may not be practical. Strings have nothing to do with that, except for the historical accident that marshal happens to use decimal strings. Changing repr(float) to produce 17 digits went a very long way toward achieving all that at the time, with minimal code changes. The consequences of that change I *really* like didn't become apparent for years.
As above. Beyond exact equality across suitable 754 boxes, we'd have to agree on a parameterized model of fp, and explain "as much accuracy as possible" in terms of that. But you don't care, so I won't bother <wink>.
That doesn't have to be part of repr()'s contract. (In fact, i would argue that already repr() makes no such promise.)
It doesn't, but the docs do say: If at all possible, this [repr's result] should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). This is possible for repr(float), and is currently true for repr(float) (on 754-conforming boxes).
repr() is about providing a representation for humans.
I think the docs are quite clear that this function belongs to str(): .... the ``informal'' string representation of an object. This differs from __repr__() in that it does not have to be a valid Python expression: a more convenient or concise representation may be used instead. The return value must be a string object.
Can we agree on maximal precision for marshalling,
I don't want to use strings at all for marshalling. So long as we do, 17 is already correct for that purpose (< 17 doesn't preserve equality, > 17 can't be relied on across 754-conforming C libraries).
and shortest-accurate precision for repr, so we can both be happy?
As I said before (again and again and again <wink>), I'm the one who has fielded most newbie questions about fp since Python's beginning, and I'm very happy with the results of changing repr() to produce 17 digits. They get a little shock at the start now, but potentially save themselves from catastrophe by being forced to grow some *necessary* caution about fp results early. So, no, we're not going to agree on this. My answer for newbies who don't know and don't care (and who are determined never to know or care) has always been to move to a Decimal module. That's less surprising than binary fp in several ways, and 2.4 will have it.
Yes, except that Scheme also requires that this string be correctly rounded to however many digits are produced. A string s just satisfying eval(s) == some_float needn't necessarily be correctly rounded to s's precision.
Also unique to Python.
Our disagreement is more fundamental than that. Then again, it always has been <smile!>.

On Tue, 30 Mar 2004, Tim Peters wrote:
That is a valuable property. I support it and support Python continuing to have that property. I hope it has been made quite clear by now that this property does not constrain how numbers are displayed by the interpreter in human-readable form. The issue of choosing an appropriate string representation of a number is unaffected by the desire for the above property.
I think we *have* made progress. Now we can set aside the red-herring issue of platform-independent serialization and focus on the real issue: human-readable string representation. So let's look at what you said about Python's accessibility:
Now you are pulling rank. I cannot dispute your longer history and greater experience with Python; it is something i greatly admire and respect. I also don't know your personal experiences teaching Python. But i can tell you my experiences. And i can tell you that i have tried to teach Python to many people, individually and in groups. I taught a class in Python at UC Berkeley last spring to 22 people who had never used Python before. I maintained good communication with the students and their feedback was very positive about the class. How did the class react to floating-point? Seeing behaviour like this: >>> 3.3 3.2999999999999998 >>> confused and frightened them, and continues to confuse and frighten almost everyone i teach. (The rare exceptions are the people who have done lots of computational work before and know how binary floating-point representations work.) Every time this happens, the teaching is derailed and i am forced to go into an explanation of binary floating-point to assuage their fears. Remember, i am trying to teach basic programming skills. How to solve problems; how to break down problems into steps; what's a subroutine; and so on. Aside from this floating-point thing throwing them off, Python is a great first language for new programmers. This is not the time to talk about internal number representation. I am tired of making excuses for Python. I love to tell people about Python and show them what it can do for them. But this floating-point problem is embarrassing. People are confused because no other system they've seen behaves like this. Other languages don't print their numbers like this. Accounting programs and spreadsheets don't print their numbers like this. Matlab and Maple and Mathematica don't print their numbers like this. Only Python insists on being this ugly. And it screws up the most common way that people first get to know Python -- as a handy interactive calculator. And for what? For no gain at all -- because when you limit your focus to the display issue, the only argument you're making is "People should be frightened." That's a pointless reason. Everything in Python -- everything in computers, in fact -- is a *model*. We don't expect the model to be perfectly accurate or to be completely free of limitations. IEEE 754 happens to be the prevalent model for the real number line. We don't print every string with a message after it saying "WARNING: MAXIMUM LENGTH 4294967296", and we shouldn't do the same for floats. "3.2999999999999998" does not give you any more information than "3.3". They both represent exactly the same value. "3.3" is vastly easier to read. The only reason you seem to want to display "3.2999999999999998" is to frighten people. So why not display "3.3 DANGER DANGER!!"? Even that would be much easier to read, but my point is that i hope it exposes the problem with your argument. I'm asking you to be more realistic here. Not everyone runs into floating-point corner cases. In fact, very few people do. I have never encountered such a problem in my entire history of using Python. And if you surveyed the user community, i'm sure you would find that only a small minority cares enough about the 17th decimal place for the discrepancy to be an issue. Now, why should that minority make it everyone else's problem? Is this CP4E or CPFPFWBFP? Computer Programming For Everybody? Or Computer Programming For People Familiar With Binary Floating-Point? You say it's better for people to get "bitten early". What's better: everyone suffering for a problem that will never affect most of them, or just those who care about the issue having to deal with it? Beautiful is better than ugly. Practicality beats purity. Readability counts. -- ?!ng

On 2004-04-06, at 14.46, Ka-Ping Yee wrote:
So how should "2.2 - 1.2 - 1" be represented? Matlab (Solaris 9): 2.22044604925031e-16 Octave (MacOS X 10.3): 2.22044604925031e-16 Python 2.3.3 (MacOS X 10.3): 2.2204460492503131e-16 Is this something you accept since Matlab does it?

On Tue, 6 Apr 2004, Simon Percivall wrote:
I accept it, but not primarily because Matlab does it. It is fine to show a value different from zero in this case because the result really is different from zero. In this case those digits are necessary to represent the machine number. In other cases (such as 1.1) they are not. -- ?!ng

On Tue, Apr 06, 2004, Ka-Ping Yee wrote:
The point is that anyone who relies on floating point will almost certainly end up with the equivalent of Simon's case at some point. The most common *really* ugly case that's hard for new programmers to deal with is: section_size = length / num_sections tmp = 0 while tmp < length: process(section) tmp += section_size and variations thereof. Just because every other programming language hides this essential difficulty is no reason to follow along. I don't have Tim's experience, but until we added that section to the tutorial, I was one of the regular "first responders" on c.l.py who was always dealing with this issue. I stand with Uncle Timmy -- it's been much easier now. If you want to teach Python to complete newcomers, you need to do one of two things: * Avoid floating point * Start by explaining what's going on (use Simon's example above for emphasis) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups?

This is an argument for making some form of decimal floating point the default in Python, with binary FP an option for advanced users -- not an argument for trying to hide the fact that binary FP is being used. Because you *can't* hide that fact completely, and as has been pointed out, it *will* rear up and bite these people eventually. It's much better if that happens early, while there is someone on hand who understands the issues and can guide them gently through the shock-and-awe phase. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

[Tim]
[Ping]
That is a valuable property. I support it and support Python continuing to have that property.
That's good, since nobody opposes it <wink>.
...
I don't think that's "the" real issue, but it is one of several. ...
Now you are pulling rank.
I'm relating my experience, which informs my beliefs about these issues more than any "head argument".
In person, mostly to hardware geeks and other hardcore software geeks. On mailing lists and newsgroups, to all comers, although I've had decreasing time for that as the years drag on.
Sorry, but so long as they stick to binary fp, stuff like that can't be avoided, even using "your rule" (other examples of that were posted today, and I won't repeat them again here). I liked Python's *former* rule myself(repr rounds to 12 significant digits), and would like it much better than "shortest possible" (which still shows tons of crap I usually don't care about) most days for my own uses. That's a real problem Python hasn't addressed: its format decisions are often inappropriate and/or undesirable (both can differ by app and by audience and by object type), and there are insufficient hooks for overriding these decisions. sys.displayhook goes a bit in that direction, but not far enough. BTW, if your students *remain* confused & frightened, it could be you're not really trying to explain binary fp reality to them.
Then they *don't* remain confused & frightened? Great. Then they've been educated. How long can it take to read the Tutorial Appendix? It's well worth however many years it takes <wink>.
Use Decimal instead. That's always been the best idea for newbies (and for most casual users of floating-point, newbie or not).
If you're teaching "basic programming skills", what other systems have they seen? Hand calculators for sure -- which is why they should use Decimal instead. Virually nothing about it will surprise them, except the liberating ability to crank up the precision.
Other languages don't print their numbers like this. Accounting programs and spreadsheets don't print their numbers like this.
I don't care -- really. I'm thoroughly in agreement with Kahan on this; see, e.g., section "QPRO 4.0 and QPRO for Windows" in http://www.cs.berkeley.edu/~wkahan/MktgMath.pdf ... the reader can too easily misinterpret a few references to 15 or 16 sig. dec of precision as indications that no more need be said about QPRO's arithmetic. Actually much more needs to be said because some of it is bizarre. Decimal displays of Binary nonintegers cannot always be WYSIWYG. Trying to pretend otherwise afflicts both customers and implementors with bugs that go mostly misdiagnosed, so fixing one bug merely spawns others. ... The correct cure for the @ROUND and @INT (and some other) bugs is not to fudge their argument but to increase from 15 to 17 the maximum number of sig. dec. that users of QPRO may see displayed. But no such cure can be liberated from little annoyances: [snip things that make Ping's skin crawl about Python today] ... For Quattros intended market, mostly small businesses with little numerical expertise, a mathematically competent marketing follow- through would have chosen either to educate customers about binary floating-point or, more likely, to adopt decimal floating-point arithmetic even if it runs benchmarks slower. The same cures are appropriate for Python.
Matlab and Maple and Mathematica don't print their numbers like this.
Those are designed for experts (although Mathematica pretends not to be).
Sorry, that's an absurd recharacterization, and I won't bother responding to it. If you really can't see any more to "my side" of the argument than that yet, then repeating it another time isn't going to help. So enough of this. In what time I can make for "stuff like this", I'm going to try to help the Decimal module along instead. Do what you want with interactive display of non-decimal floats, but do try to make it flexible instead of fighting tooth and nail just to replace one often-hated fixed behavior with another to-be-often-hated fixed behavior. ...
Not everyone runs into floating-point corner cases. In fact, very few people do.
Heh. I like to think that part of that has to do with the change to repr()! As I've said many times before, we *used* to get reports of a great variety of relatively *subtle* problems due to binary fp behavior from newbies; we generally get only one now, and the same one every time. They're not stupid, Ping, they just need the bit of education it takes to learn something about that expensive fp hardware they bought.
I have never encountered such a problem in my entire history of using Python.
Pride goeth before the fall ...
The result of int() can change by 1 when the last bit changes, and the difference between 2 and 3 can be a disaster -- see Kahan (op. cit.) for a tale of compounded woe following from that one. Aahz's recent example of a loop going around one time more or less "than expected" used to be very common, and is the same thing in a different guise. It's like security that way: nobody gives a shit before they get burned, and then they get livid about it. If a user believes 0.1 is one tenth, they're going to get burned by it.
The force of this is lost because you don't have a way to spare users from "unexpected extra digits" either. It comes with the territory! It's inherit in using binary fp in a decimal world. All you're really going on about is showing "funny extra digits" less often -- which will make them all the more mysterious when they show up. I liked the former round-to-12-digits behavior much better on that count. I expect to like Decimal mounds better on all counts except speed.
participants (15)
-
Aahz
-
Andrew Koenig
-
Andrew P. Lentvorski, Jr.
-
Bob Ippolito
-
Dan Sugalski
-
Dennis Allison
-
Edward Loper
-
Greg Ewing
-
Jeremy Hylton
-
Josiah Carlson
-
Ka-Ping Yee
-
Shane Hathaway
-
Simon Percivall
-
Terry Reedy
-
Tim Peters