[Python-ideas] Python Numbers as Human Concept Decimal System

Fri Mar 7 20:20:00 CET 2014

Mark,

I don't know if it has occurred to you, but several of the rhetorical forms
you have used are pretty upsetting to many Python devs. (I can give you a
list offline.) It would really help the discussion if you tried to keep
your rhetoric in check. (A few other participants might also want to focus
more on the topic and less on Mark.)

Now I'd like to restate the core issue in ways that should appeal to Python
devs. I am describing the status quo in Python 3.4, since that's the
starting point for any evolution of Python. Please bear with me as I
describe the status quo -- I find it necessary as a context for any future
proposals.

Python's parser recognizes a few different types of number literals:
integers (e.g. 42), "decimal  point" notation (e.g. 3.14), "exponential
notation" (e.g. 2.1e12), and any of these followed by 'j'. There are also
binary, octal and hex notations for integers (e.g. 0b1010, 0o12, 0xa). The
parser, by design, does not have any understanding of the context in which
these numbers are used -- just as it doesn't know about the types of
variables (even when it's obvious -- e.g. after seeing "x = 4" the parser
does not keep track of the fact that x is now an integer with the value 4,
although it does remember it has seen an assignment to x).

What the parser does with those number literals (as with string literals
and with the keywords None, True, False) is that it creates an object of an
appropriate type with an appropriate value. This object is then used as a
value in the expression in which it occurs.

The current algorithm for creating the object is to create an int (which is
implemented as an exactly represented arbitrary-precision integer) when the
literal is either binary/octal/hex or decimal without point or exponent, a
Python float (which is an object wrapping an IEEE 754 double) when a
decimal point or exponent is present, and a Python complex (which wraps
*two* IEEE doubles) when a trailing 'j' is seen.

Note that negative numbers are not literals -- e.g. -10 is an expression
which applies the unary minus operator to an integer object with the value
10. (Similarly, IEEE inf and NaN don't have literals but requires
evaluating an expression, e.g. float('inf').)

Not all numbers come from literals -- most come from expressions such as
x+y, some are read from files or other input media. (E.g. the struct module
can "read" IEEE floats and doubles from byte strings containing their
"native" binary encoding, and float() with a string argument can interpret
decimal numbers with optional decimal point and/or exponent.)

The clever thing about expressions (in Python as in many languages) is that
the overloading of operators can make the "right" think happen when numbers
of different types are combined. Consider x+y. If x and y are both ints,
the result is an int (and is computed exactly). If either is a float and
the other an int, the result is a float. If either is a complex and the
other is an int or float, the result is a float.

If you are defining your own number-ish type in Python or C code, you can
make it play this game too, and this is how the current Decimal and
Fraction types work. Python's operator overloading machinery is flexible
enough so that e.g. Fraction can say "if a Fraction and an int are added,
the result is a Fraction; but if a Fraction and a float are added, the
result is a float." (Because Python is dynamic, it would be *possible* to
define a Fraction type that returns a plain int for results whose
denominator is 1, but the stdlib's Fraction type doesn't do this.)

Most of the time, the built-in types are conservative in their return type
(by which I mean that adding two ints returns an int, not a float), but
there are a few exceptions: int/int returns a float (because returning an
int would require truncation or a compound (div, mod) result), and the **
(power) operator also has a few special cases (int**negative_int and
negative_float**non_integer).

Now let's move on to the problems, which begin with the current float type.
Given the representation it is obvious that various limitations exist;
results may have to be truncated to fit into 64 bits, and conversion to and
from decimal cannot always preserve the exact value. It turns out that the
truncation of results doesn't bother most users (everyone knows what a
typical hand calculator does for 1/3), but the issues around conversion to
and from decimal often trip over users on their initial forays into
learning the language. The improvement we made to the output conversion
(dropping digits that don't affect round-tripping) take some of the sting
out of this, but Python newbies still excitedly point out some of the
unavoidable anomalies like 1.1 + 2.2 ==> 3.3000000000000003 when they first
discover it.

So what to do about it?

If we aren't too worried about strict backwards compatibility, there's an
easy answer that should shut up the newbies: change the parser so that when
it sees a number with a decimal point or an exponent it returns a Decimal
instance, and make a bunch of other adjustments to match. (E.g repr() of
the Decimal 3.14 should return '3.14' instead of "Decimal('3.13')".) The
float() builtin should also return a Decimal (we should probably just
rename the whole type to 'float'). The math module will have to be
rewritten. We should probably preserve the existing binary float under some
other name, perhaps requiring an import. We'll have to decide what to do
about complex numbers. (One option would be to remove supporting them in
the parser.) The details can be worked out in a PEP. I don't expect this
process to be easy or without controversies, but I'm confident we could
come up with a good design.

But there are big problems with such a proposal.

Those newbies perhaps aren't Python's most important user population. (And
for many of them it is actually a moment of enlightenment on their way to
becoming experts, once they hear the explanation.) The problems with binary
floats don't affect most actual calculations (never mind 1.1+2.2, once
you've seen what 1/3 looks like you switch all your output to some kind of
rounded format). Many of the problems with floating point that actually
matter (such as truncation, or infinities, or NaN) don't go away just by
switching everything to Decimal.

For many Python programs the switch to decimal would not matter at all. For
example, in all my own recent coding (which focuses on network I/O), the
only uses I've had for float is for timekeeping (which is inherently
approximate) and to prints some percentages in a report. Such code wouldn't
be affected at all -- it wouldn't break, but it wouldn't be any simpler
either. Decimal/binary simply isn't an issue here.

However, for the growing contingent of scientists who use Python as a
replacement for Matlab (not Mathematica!), it could be a big nuisance. They
don't care about decimal issues (they actually like binary better) and they
write lots of C++ code that interfaces between CPython's internal API and
various C++ libraries for numeric data processing, all of which use IEEE
binary. (If anything, they'd probably want a 32-bit binary float literal
more than a decimal float literal.) Python already defines some macros for
converting between standard Python float objects and C++ doubles (e.g.
PyFloat_AS_DOUBLE), so technically we could support source-code
compatibility for these users, but performance would definitely suffer
(currently that macro just returns the C++ double value already present in
the object; it would have to be changed to call a costly decimal-to-binary
macro).

There is also the huge effort of actually implementing such a proposal.
It's not insurmountable, but I estimate it would be at least as much work
as the str->unicode conversion we did for Python 3.

All in all I just don't see the stars aligned for this proposal. (Nor for
anything even more sweeping like representing 1/3 as a fraction or some
other symbolic representation.)

So what to do instead? A few simper proposals have been made.

Maybe we can add a new literal notation meaning 'decimal'; that would be a
relatively straightforward change to the parser (once the new decimal
extension module is incorporated), but it would not do much to avoid
surprising newbies (certainly you can't go teaching them to always write
3.14d instead of 3.14). However, it would probably help out frequent users
of Decimal. (Then again, they might not be using literals that much -- I
imagine the source of most such programs is user or file input.)

I'm not excited about a notation for Fraction -- the use cases are too
esoteric.

Maybe we can fix the conversion between Decimal and float (if this is
really all that matters to Mark, as it appears to be from his last email --
I'd already written most of the above before it arrived). Could it be as
simple as converting the float to a string using repr()? Given the smarts
in the float repr() that should fix the examples Mark complained about. Are
there any roadblocks in the implementation or assumptions of the Decimal
type here? Perhaps the default Decimal context being set for a much higher
precision makes it philosophically unacceptable?

I think I've said as much as I can, for now.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140307/547dd34f/attachment.html>