[Python-Dev] Splitting the PEP for adding a decimal type to Python

Guido van Rossum guido@zope.com
Fri, 27 Jul 2001 17:06:37 -0400

> I'm not fond of dialects when they don't serve a significant
> purpose.  However, I believe it would be useful to at least discuss
> creating a special purpose "safe" mode for the Python lexer.  This
> mode would be attractive to newbies and financial programmers.
> Calling this a new dialect is an overstatement.  It is more like
> defining a subset of the language that uses a special vocabulary for
> working with decimal types.

Sounds like a dialect to me.  But alright, I'll take your word for
it. :-)

> > > Another would be to use Unicode as the default character set.  This
> > > would allow Unicode characters to be in strings without needing to
> > > escape them.

> > That's not a dialect, that's a different input encoding.  MAL already
> > has a PEP for that.

> I know about the PEP.  I was refering to making it the default string type 
> for a '.dp' file.  There would be no prefix 'u' required.  

Have you thourght this through?  What would be the input encoding?
How do you expect your programmers to edit their Unicode files?

Otherwise, the only effect of making all string literals Unicode
strings is to break most of the standard library.  You can get this
effect with "python -U" today.  It's not pretty.  (That option exists
to see how much progress has been made with Python's Unicodification,
not for anything very practical.)

> I'll remove this and the other unrelated items from the decimal type PEP

It would indeed be better to focus on one idea at a time.

> If you don't agree with the idea of adding dpython lexer mode then
> there is no point in discussing the features that would be in that
> mode.

Maybe you can rewrite the PEP to explain the idea better.  It wasn't
very clear the first time.

> > Or in Python .NET.  Decoupling the various part of the parse+compile
> > pipeline is something I've considered.
> Did you decide against it, or has it just not been a high enough priority?

It's one of those many "would-be-nice" things that I never get to...

> > But again this has nothing to do with decimal numbers: your proposal
> > allows the mixing of decimal and binary numbers (as long as one of
> > them uses an explicit base indicator) so you don't really need two
> > parsers -- you need one tokenizer plus a way to specify the default
> > numeric base for literals.
> That is exactly what I implemented.  The dpython command and the
> '.dp' cause the Py_USE_DECIMAL_AS_DEFAULT[1] flag to be set.  When
> this flag is set decimal numbers are used for literals.

Where is this flag set?  Is it a global variable?  If my main program
has the .dp extension, does the flag remain set for all other module
that it imports?

> > I'll have to go back to your defense of the two dialect approach, but
> > I think it's neither sufficient nor necessary.
> I have mixed too many ideas into a PEP.  I'll rework the PEP to remove the 
> cruft and focus on the addition of decimal numbers.  I move the other ideas 
> into a separate PEP.

Posterity will be grateful.

> > Well, sometimes more generality than you need hurts.  I'm not
> > convinced that we need an open-ended set of numeric literals.  But in
> > the light of the unified numeric model, we may need ways to make
> > exactness or inexactness explicit, and/or we may need a way to specify
> > rational numbers.  If we can fit all of these in the
> > number-with-letter-suffix mold, that would be nice for the lexer, I
> > suppose.
> I worry about a "unified numerical model" getting overly complex.

Funny.  I think that a unified numeric model will take away some
complexity from the current model; for example the programmer would no
longer have to be aware of the limit on int values, so nobody would
have to learn about long any more.

> I think decimal numbers help because they are a better choice than
> binary numbers for a significant percentage of all software
> applications.

(Just not for most of the apps that are likely to be written in Python
today. :-)

> I know that rationale numbers are imporant in some applications.  Am
> I overlooking some huge class of applications that use rationales?

I doubt it -- if I was allowed to add exactly *one* numeric type to
Python, and I had to choose between decimal and rational, I'd choose
decimal.  Practicality beats purity.

> While Tim and some of the other Pythoneers can probably think of
> dozens of specialized numerical types, I would venture to guess that
> binary types and a decimal type probably cover 90% of all the user's
> requirements.

Add rational, and I'd agree.

> [1] I'll be renaming the flat to this in the next version.  The flag
> is currently called Py_NEW_PARSER.  I named it that because at one
> time I was creating a new parser.  I trimmed the changes down to
> just a few edits of the tokenizer and compile.c

Why does a flag variable have an UPPER_CASE name?  That normally means
the name is a preprocessor symbol.

[Next message]

> > But I foresee serious problems.  Most standard library modules use
> > numbers.  Most of the modules using numbers occasionally use a
> > literal (e.g. 0 or 1).  According to your PEP, literals in module
> > files ending with .py default to binary.  This means that almost
> > any use of a standard library module from your "dpython" will fail
> > as soon as a literal is used.

> No, because the '.py' file will generate bytecodes for a number
> literals as binary number when the module is compiled.  If a '.dp'
> file imports the contents of a '.py' file the binary numbers will be
> imported as binary numbers.  If the '.dp' file will need to use the
> binary number in a calculation with a decimal number the binary
> number will have to be cast it to a decimal number.

I understood all that.  but what if the decimal module wants to pass
some numbers into a binary module.  Then it has to make sure all the
arguments it passes are decimal.

> ---------------------
> #gui.py
> BLUE = 155
> x_axis = 1024
> y_axis = 768
> --------------------
> #calculator.dp
> import gui
> ytd_interest = 0.04
> # ytd_interest is now a decimal number
> win = gui.open_window(gui.bg, x_size=gui.x_axis, y_size=gui.y_axis)
> app = win.dialog("Bank Balance", bankbalance_callback)
> bb  = app.get_bankbalance()
> # bb now contains a string
> newbalance = decimal(bb) *ytd_interest
> # now update the display
> app.set_bankbalance(str(newbalance))
> -------------------
> In the example the gui module was used in the calculator module, but they
> were alway handled as binary numbers.  The parser did not convert them to
> decimal numbers because they had been parsed into a gui.pyc file prior to
> being loaded into calculator.dp.

Blech.  That means that whenever you use a library module that does
something useful with your data, you have to convert all your data
explicitly to binary, even if it's just integers.  Yuck.  Bah.  (Need
I say more?  OK, one more then.  Argh! :-)

> > I can't believe that this will work satisfactorily.
> I think it will.  There will be some cases where it might be
> necessary to add modules of convenience functions to make it easier
> to to use applications that cross boundaries, but I think these
> cases will be rare.

I would be much more comfortable if there was just one integer type,
or if at least binary ints would mix freely with decimal ints.  I see
a lot of use for decimal *floating point* (more predictable
arithmetic, calculator style), and also a lot of use for decimal
*fixed point* (money calculations), but I don't see the need for
distinguishing the radix of of integers.

> Immediately following the introduction of the decimal number types
> all binary modules will work as the work today.  There will be no
> additional pain to continue using those module.  There will be no
> decimal modules, so there is no problem with making them work with
> the binary modules.  As decimal module users start developing
> applications they will develop techniques for working with the
> binary modules.  Initially it may require a significant effort, but
> eventually bondaries will be created and they two domains will
> coexists.

You make it sound as if most of the standard library would not be
useful for decimal users.  I doubt that.  Decimal users also need to
parse XML, do bisection on lists, use database files, and so on.

> > Another example of the kind of problem your approach runs into: what
> > should the type of len("abc") be?  3d or 3b?  Should it depend on the
> > default mode?
> That is an interesting question.  With my current proposal the following
> would be required:
> stlen = decimal(len("abc"))
> A dlen() function could be added, or perhaps allowing the automatic
> promotion of int to a decimal would be a reasonable exception.  That
> is one case were there is no chance of data loss.  I'm not apposed
> to automatic conversions if there is no danger of errors being
> introduced.

OK, then we agree.  Let's freely allow mixing decimal and binary
integers.  That makes much more sense.

> > I suppose sequence indexing has to accept decimal as well as
> > binary integers as indexes -- certainly in a decimal program you
> > will want to be able to use decimal integers for indexes.
> That is how I would expect it to work.

But it contradicts your original assertion that decimal and binary
numbers were two incompatible types.  Glad we sorted that out.

[Next message]

> > I'm not optimistic about Michael's PEP.  He seems to insist on a
> > total separation between decimal and binary numbers that I don't
> > believe can work.

> I'm not insisting on total separation.  I propose that we start with
> a requirement that an explicit call be made to a conversion
> function.  These functions would allow a decimal type to be
> converted to a float or to an int.  There would also be conversion
> function going from a float or an int to a decimal type.

(Except for ints, we have now established.)

> What I would like to avoid is creating a decimal type in Python that
> enables silent errors that are difficult to recognize.  Allowing
> automatic coersion between the binary and decimal types will open
> the door to errors that would be detected if a conversion is
> required.  If at some point in the future it becomes apparent that a
> particular form of coersion is safe and useful it could be added.
> I'd like to move slowly on opening up this potential trouble spot.

I recommend that you make a more complete analysis of what errors you
want to avoid.  Every binary can be represented in decimal if you
allow enough digits.

On the other hand, if you are thinking of decimal floating point, some
decimal calculations will also lose precision.  If you never want to
lose precision, the radix of the numbers is a red herring, and you
might as well use rationals under the covers.  If you allow the kind
of precision loss that decimal floating point can cause, I would like
to understand more about what it *is* that you are trying to avoid
with your Draconian separation rule.  Floating point decimal
arithmetic cannot avoid loss of precision for division (e.g. 1d/3d
cannot be represented exactly with a finite number of decimal
digits).  Fixed point decimal arithmetic isn't any better.

> >  I haven't replied to him yet because I can't explain it well
> > enough yet -- but I don't believe there's much of a future in his
> > particular idea.
> I guess I'm not understanding something about the direction you are
> taking Python.  As I understood the goals of the CP4E project you
> were attempting to make Python appealing to a wider audience and
> make it possible for everyone to learn to write programs.  And then
> there are occasional references to a Python 3k which will fix some
> Python warts.  My proposal moves Python towards these goals, while
> retaining full backwards compatible.  I am not trying to create a
> new interpreter.

I think you haven't completely thought through the rules you are
proposing, and you haven't stated your underlying goals very clearly.
I believe the rules that you *claim* to propose won't further your
goals, but it seems that you aren't sure of the rules you propose and
maybe you aren't sure of your goal either.  Under these adverse
circumstances I'm trying to tease out a set of rules that might
further the kind of goal I *think* you want to obtain, but it's hard
because you have overspecified your "solution".

> I'm trying to make the current interpreter useful to a wider market.

Adding an Oracle module to the standard library would probably do more
to further that goal than any wrangling with the numeric model that we
can carry out here... :-)

> What is it you are trying to accomplish in the process of "unifying
> the numerical types" in Python?

Removing specific warts of the current numeric system that require the
programmer to be aware of more details than necessary.  We will never
be able to remove the need for careful numerical analysis of
algorithms involving floating point (be it binary or decimal).  But we
can certainly remove the need to be aware of the number of bits in a
machine word (long/int unification, PEP 237) or the need to explicitly
promote ints to floats in certain cases (PEP 238).

--Guido van Rossum (home page: http://www.python.org/~guido/)