Mark,

I don't know if it has occurred to you, but several of the rhetorical forms you have used are pretty upsetting to many Python devs. (I can give you a list offline.) It would really help the discussion if you tried to keep your rhetoric in check. (A few other participants might also want to focus more on the topic and less on Mark.)

Now I'd like to restate the core issue in ways that should appeal to Python devs. I am describing the status quo in Python 3.4, since that's the starting point for any evolution of Python. Please bear with me as I describe the status quo -- I find it necessary as a context for any future proposals.

Python's parser recognizes a few different types of number literals: integers (e.g. 42), "decimal  point" notation (e.g. 3.14), "exponential notation" (e.g. 2.1e12), and any of these followed by 'j'. There are also binary, octal and hex notations for integers (e.g. 0b1010, 0o12, 0xa). The parser, by design, does not have any understanding of the context in which these numbers are used -- just as it doesn't know about the types of variables (even when it's obvious -- e.g. after seeing "x = 4" the parser does not keep track of the fact that x is now an integer with the value 4, although it does remember it has seen an assignment to x).

What the parser does with those number literals (as with string literals and with the keywords None, True, False) is that it creates an object of an appropriate type with an appropriate value. This object is then used as a value in the expression in which it occurs.

The current algorithm for creating the object is to create an int (which is implemented as an exactly represented arbitrary-precision integer) when the literal is either binary/octal/hex or decimal without point or exponent, a Python float (which is an object wrapping an IEEE 754 double) when a decimal point or exponent is present, and a Python complex (which wraps *two* IEEE doubles) when a trailing 'j' is seen.

Note that negative numbers are not literals -- e.g. -10 is an expression which applies the unary minus operator to an integer object with the value 10. (Similarly, IEEE inf and NaN don't have literals but requires evaluating an expression, e.g. float('inf').)

Not all numbers come from literals -- most come from expressions such as x+y, some are read from files or other input media. (E.g. the struct module can "read" IEEE floats and doubles from byte strings containing their "native" binary encoding, and float() with a string argument can interpret decimal numbers with optional decimal point and/or exponent.)

The clever thing about expressions (in Python as in many languages) is that the overloading of operators can make the "right" think happen when numbers of different types are combined. Consider x+y. If x and y are both ints, the result is an int (and is computed exactly). If either is a float and the other an int, the result is a float. If either is a complex and the other is an int or float, the result is a float.

If you are defining your own number-ish type in Python or C code, you can make it play this game too, and this is how the current Decimal and Fraction types work. Python's operator overloading machinery is flexible enough so that e.g. Fraction can say "if a Fraction and an int are added, the result is a Fraction; but if a Fraction and a float are added, the result is a float." (Because Python is dynamic, it would be *possible* to define a Fraction type that returns a plain int for results whose denominator is 1, but the stdlib's Fraction type doesn't do this.)

Most of the time, the built-in types are conservative in their return type (by which I mean that adding two ints returns an int, not a float), but there are a few exceptions: int/int returns a float (because returning an int would require truncation or a compound (div, mod) result), and the ** (power) operator also has a few special cases (int**negative_int and negative_float**non_integer).

Now let's move on to the problems, which begin with the current float type. Given the representation it is obvious that various limitations exist; results may have to be truncated to fit into 64 bits, and conversion to and from decimal cannot always preserve the exact value. It turns out that the truncation of results doesn't bother most users (everyone knows what a typical hand calculator does for 1/3), but the issues around conversion to and from decimal often trip over users on their initial forays into learning the language. The improvement we made to the output conversion (dropping digits that don't affect round-tripping) take some of the sting out of this, but Python newbies still excitedly point out some of the unavoidable anomalies like 1.1 + 2.2 ==> 3.3000000000000003 when they first discover it.

So what to do about it?

If we aren't too worried about strict backwards compatibility, there's an easy answer that should shut up the newbies: change the parser so that when it sees a number with a decimal point or an exponent it returns a Decimal instance, and make a bunch of other adjustments to match. (E.g repr() of the Decimal 3.14 should return '3.14' instead of "Decimal('3.13')".) The float() builtin should also return a Decimal (we should probably just rename the whole type to 'float'). The math module will have to be rewritten. We should probably preserve the existing binary float under some other name, perhaps requiring an import. We'll have to decide what to do about complex numbers. (One option would be to remove supporting them in the parser.) The details can be worked out in a PEP. I don't expect this process to be easy or without controversies, but I'm confident we could come up with a good design.

But there are big problems with such a proposal.

Those newbies perhaps aren't Python's most important user population. (And for many of them it is actually a moment of enlightenment on their way to becoming experts, once they hear the explanation.) The problems with binary floats don't affect most actual calculations (never mind 1.1+2.2, once you've seen what 1/3 looks like you switch all your output to some kind of rounded format). Many of the problems with floating point that actually matter (such as truncation, or infinities, or NaN) don't go away just by switching everything to Decimal.

For many Python programs the switch to decimal would not matter at all. For example, in all my own recent coding (which focuses on network I/O), the only uses I've had for float is for timekeeping (which is inherently approximate) and to prints some percentages in a report. Such code wouldn't be affected at all -- it wouldn't break, but it wouldn't be any simpler either. Decimal/binary simply isn't an issue here.

However, for the growing contingent of scientists who use Python as a replacement for Matlab (not Mathematica!), it could be a big nuisance. They don't care about decimal issues (they actually like binary better) and they write lots of C++ code that interfaces between CPython's internal API and various C++ libraries for numeric data processing, all of which use IEEE binary. (If anything, they'd probably want a 32-bit binary float literal more than a decimal float literal.) Python already defines some macros for converting between standard Python float objects and C++ doubles (e.g. PyFloat_AS_DOUBLE), so technically we could support source-code compatibility for these users, but performance would definitely suffer (currently that macro just returns the C++ double value already present in the object; it would have to be changed to call a costly decimal-to-binary macro).

There is also the huge effort of actually implementing such a proposal. It's not insurmountable, but I estimate it would be at least as much work as the str->unicode conversion we did for Python 3.

All in all I just don't see the stars aligned for this proposal. (Nor for anything even more sweeping like representing 1/3 as a fraction or some other symbolic representation.)

So what to do instead? A few simper proposals have been made.

Maybe we can add a new literal notation meaning 'decimal'; that would be a relatively straightforward change to the parser (once the new decimal extension module is incorporated), but it would not do much to avoid surprising newbies (certainly you can't go teaching them to always write 3.14d instead of 3.14). However, it would probably help out frequent users of Decimal. (Then again, they might not be using literals that much -- I imagine the source of most such programs is user or file input.)

I'm not excited about a notation for Fraction -- the use cases are too esoteric.

Maybe we can fix the conversion between Decimal and float (if this is really all that matters to Mark, as it appears to be from his last email -- I'd already written most of the above before it arrived). Could it be as simple as converting the float to a string using repr()? Given the smarts in the float repr() that should fix the examples Mark complained about. Are there any roadblocks in the implementation or assumptions of the Decimal type here? Perhaps the default Decimal context being set for a much higher precision makes it philosophically unacceptable?

I think I've said as much as I can, for now.

--
--Guido van Rossum (python.org/~guido)