PEP for adding a decimal type to Python
data:image/s3,"s3://crabby-images/2cd86/2cd86fd9e354751c4039e21f01028a59a53287ad" alt=""
The PEP I posted yesterday, which currently doesn't have a number, addresses the syntactic issues of adding a decimal number type to Python and it investigates how to safely introduce the new type in a language with a large base of legacy code. The PEP does not address the definition of how decimal numbers should be implemented in Python. This topic has been the subject of other PEPs. The PEP also proposes the definition of a new language dialect that makes some small improvements on the syntax of the classic Python language. The changes to the numerical model is tailored to make the language attractive to two very important markets. Many of the users attracted from these markets may initially have little or no interest in classic Python. They may not even know that the Python language exists. They will happily use a language called dpython that works very well for their profession. The interesting thing about the proposed language is how little effort will be required to create and maintain it. The additions to Python were straightforward and the total patch was only a few hundred lines. The prototype implementation uses the following rules when interpreting the type to be created from a number literal. literal '.py' '.dp' interactive interactive value file file python dpython 2.2b float float float float 2b int int int int 2.2 float decimal float decimal 2 int decimal int decimal 2.2d decimal decimal decimal decimal 2d decimal decimal decimal decimal Based on a comment from Guido I've decided to change the 'f' to 'b' in the next version of dpython. That will be more descriptive of the distinction the types. [Michael]
This was a proposal for a mechanism for mingling types safely. It was not intended as a definition of how decimal numbers should be implemented. My implementation tests the interaction of the current number types with the decimal type and I only completed enough of the decimal type implementation to support this testing. I was not expecting to discuss how decimal types should work. That has been discussed already. I was primarily interested in testing the effects of adding a new number type as I described in the PEP.
Can you summarize the rules you used for mixed arithmetic? I forget what your PEP said would happen when you add a decimal 1 to a binary 1. Is the result decimal 2 or binary 2? Why?
The rule is very simple. You can't mix the types. You must explicitly cast a binary to a decimal or a decimal to a binary. This introduces the least chance of error. This pedantic behavior is very important in fields like accounting. I want accountants to think of the proposed dpython language as the COBOL version of Python:-) This approach is also the correct one to take for newbies. They will get a nice clean exception if they mix the types. This error will be something they can easily look up in the documentation. An unexpected answer, like 1/2=0, will just leave them scratching their head. This proposal tries to be consistent with what I like about Python and what I think makes Python a great language. The implementation maintains complete backwards compatibility and it requires that programmers explicitly state that they want to do something rather than have bad things happen unexpectedly. Mixing different types of numbers can lead to bugs that are very difficult to identify. The nature of the errors that would occur when binary numbers are used instead of decimals would be particularly difficult to detect. The answers would always be very close, and sometimes they would be correct. Without the use of an explicit cast these errors would be silent. The price paid for being pedantic will be the occasional need to add an int() or float() around a decimal number or an decimal() around a float or int.
What did you think of the idea of adding a new command and file format?
I don't think that would be necessary. I'd prefer the 'd' and 'f' (or maybe 'b'?) suffixes to be explicit, perhaps combined with an optional per-module directive to set the default. This would be more robust than keying on the filename extension.
Why do you think a directive statement would be more robust than using a file suffix or command name as the directive? I'll try to explain why I think the opposite is true. Take the example of teaching a newbie to program. They must be told some basic things For instance, they will have only been told to use a specific suffix for the file name in order to create a new module. So how do you make sure that the newbie always uses decimal numbers? If a directive statement is required then the newbie must remember to always add this statement at the top of a file. If they forget they will not get the expected results. With the file suffix based approach they will have to use a '.dp' suffix for the file name of a new module. If the are told to use a '.dp' suffix from the outset then the chances of their accidentally typing '.py' instead of '.dp' is very unlikely, whereas, forgetting to add a directive would be a silent error that they might easily forget. Your request to have an explicit 'd' and 'f' is already implemented. The prototype implementation allows an explicit 'd' or 'f' to be used at anytime. The rules on the interpretation of the values that have no suffix were defined earlier. The prototype implementation simply uses the suffix of the module file and the name of the command as the directive. This approach provides a very natural language experience to someone working in a profession that normally uses decimal numbers. They are not treated as second class citizens who must endure the clutter of a magic directive statements at the top of every module they create. They just use their special command and the file extension.
If you have to change the default globally, I'd prefer a command line option. After all it's only the scanner that needs to know about the different defaults, right?
I think there would be a problem with only using a command line option. It would work for files that are named on the command line and for code being interpreted in an interactive session. However, for imported modules the meaning of a number literal must be based on the author intentions when the module was created. This means that the interpreter must recognize the type of file so it can determine how compile the literals defined in the module. If the command line option determines how a scanner is to convert the number literals then a module source file could incorrectly be converted if the wrong command line option were used.
I wonder about the effectiveness of the default though. If you write a module for decimal arithmetic, how do you prevent a caller to pass in a binary number?
Since the module is written with decimal numbers an exception would be raised if a binary number was used where a decimal number was required. For instance: --------------- #File spam.py a = 1.0 --------------- #File eggs.dp import spam c = a + 1.0 --------------- The name 'a' was compiled into a float type object when the spam.py file was scanned. So when the expression being assigned to 'c' is executed it would result in an TypeError being raised because a float was being added to a decimal.
decimalobject.c so I could test the impact of introducing an additional command and file format to Python. I expect this code to be replaced. As I said in the PEP I also think the decimal number implementation will evolve into a type that supports inheritance.
Please, please, please, unify all these efforts. A decimal PEP would be a good one, but there should be only one.
Absolutely. The PEP process is suppose to formalize the capture of ideas so they can be reference. This PEP is mostly orthogonal to Aahz's proposal. They can be merge, or we can reference each others PEP. I'm probably not the best choice for doing the implement of the decimal number semantics, so I'd be happy to work with Aahz.
data:image/s3,"s3://crabby-images/addaf/addaf2247848dea3fd25184608de7f243dd54eca" alt=""
Just a suggestion which might also open the door for other numeric type extensions to play along nicely: Would it make sense to have an extensible registry of constructors for numeric types which maps number literal modifiers to constructors ? I am thinking of 123L -> long("123") 123i -> int("123") 123.45f -> float("123.45") The registry would map 'L' to long(), 'i' to int(), 'f' to float() and be extensible in the sense, that e.g. an extension like mxNumber could register its own mappings which would make the types defined in these extensions much more accessible without having to path the interpreter. mxNumber for example could then register 'r' to map to mx.Number.Rational() and a user could then write 1/2r would map to 1 / mx.Number.Rational("2") and generate a Rational number object for 1/2. The registry would have to be made smart enough to seperate integer notations from floating point ones and use two separate default mapping for these, e.g. '<int>' -> int() and '<float>' -> float(). The advantage of such a mechanism would be that a user could easily change the literal semantics at his/her taste. Note that I don't think that we really need a separate interpreter just to add decimals or rationals to the core. All that is needed is some easy way to construct these number objects without too much programming overhead (i.e. number of keys to hit ;-). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
data:image/s3,"s3://crabby-images/3ce65/3ce654d3e7cefe0116a594a13f4c84dce2b4ec49" alt=""
Just a suggestion which might also open the door for other numeric type extensions to play along nicely:
Would it make sense to have an extensible registry of constructors for numeric types which maps number literal modifiers to constructors ?
I am thinking of
123L -> long("123") 123i -> int("123") 123.45f -> float("123.45")
The registry would map 'L' to long(), 'i' to int(), 'f' to float() and be extensible in the sense, that e.g. an extension like mxNumber could register its own mappings which would make the types defined in these extensions much more accessible without having to path the interpreter. mxNumber for example could then register 'r' to map to mx.Number.Rational() and a user could then write 1/2r would map to 1 / mx.Number.Rational("2") and generate a Rational number object for 1/2.
The registry would have to be made smart enough to seperate integer notations from floating point ones and use two separate default mapping for these, e.g. '<int>' -> int() and '<float>' -> float().
The advantage of such a mechanism would be that a user could easily change the literal semantics at his/her taste.
Note that I don't think that we really need a separate interpreter just to add decimals or rationals to the core. All that is needed is some easy way to construct these number objects without too much programming overhead (i.e. number of keys to hit ;-).
Funny, I had a similar idea today in the shower (always the best place to think :-). I'm not sure exactly how it would work yet -- currently, literals are converted to values at compile-time, so the registry would have to be available to the compiler, but the concept seems to make more sense if it is available and changeable at runtime. Nevertheless, we should keep this in mind. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/addaf/addaf2247848dea3fd25184608de7f243dd54eca" alt=""
Guido van Rossum wrote:
Just a suggestion which might also open the door for other numeric type extensions to play along nicely:
Would it make sense to have an extensible registry of constructors for numeric types which maps number literal modifiers to constructors ?
I am thinking of
123L -> long("123") 123i -> int("123") 123.45f -> float("123.45")
The registry would map 'L' to long(), 'i' to int(), 'f' to float() and be extensible in the sense, that e.g. an extension like mxNumber could register its own mappings which would make the types defined in these extensions much more accessible without having to path the interpreter. mxNumber for example could then register 'r' to map to mx.Number.Rational() and a user could then write 1/2r would map to 1 / mx.Number.Rational("2") and generate a Rational number object for 1/2.
The registry would have to be made smart enough to seperate integer notations from floating point ones and use two separate default mapping for these, e.g. '<int>' -> int() and '<float>' -> float().
The advantage of such a mechanism would be that a user could easily change the literal semantics at his/her taste.
Note that I don't think that we really need a separate interpreter just to add decimals or rationals to the core. All that is needed is some easy way to construct these number objects without too much programming overhead (i.e. number of keys to hit ;-).
Funny, I had a similar idea today in the shower (always the best place to think :-). I'm not sure exactly how it would work yet -- currently, literals are converted to values at compile-time, so the registry would have to be available to the compiler, but the concept seems to make more sense if it is available and changeable at runtime.
True, but deferring the conversion to runtime (by e.g. using literal descriptors ;-) would cause a significant slowdown. So, I believe that the compiler would have be told before starting the compile process or within the process by looking at some magical constant/comment in the source code (I think that this ought to be a per-file overrideable setting, since some code may simply fail to work if it suddenly starts to work with different types).
Nevertheless, we should keep this in mind.
I could reformat the above into a PEP or Michael could simply the idea as section to his PEP. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
data:image/s3,"s3://crabby-images/3ce65/3ce654d3e7cefe0116a594a13f4c84dce2b4ec49" alt=""
Funny, I had a similar idea today in the shower (always the best place to think :-). I'm not sure exactly how it would work yet -- currently, literals are converted to values at compile-time, so the registry would have to be available to the compiler, but the concept seems to make more sense if it is available and changeable at runtime.
True, but deferring the conversion to runtime (by e.g. using literal descriptors ;-) would cause a significant slowdown.
So, I believe that the compiler would have be told before starting the compile process or within the process by looking at some magical constant/comment in the source code (I think that this ought to be a per-file overrideable setting, since some code may simply fail to work if it suddenly starts to work with different types).
This may be the first place where a 'directive' statement actually makes sense to me.
Nevertheless, we should keep this in mind.
I could reformat the above into a PEP or Michael could simply the idea as section to his PEP.
I'm not optimistic about Michael's PEP. He seems to insist on a total separation between decimal and binary numbers that I don't believe can work. I haven't replied to him yet because I can't explain it well enough yet -- but I don't believe there's much of a future in his particular idea. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/83b59/83b59a9397d1ed68e41d5773f0dfbd9baaee4d40" alt=""
On Friday 27 July 2001 11:50 am, Guido van Rossum wrote:
I'm not optimistic about Michael's PEP. He seems to insist on a total separation between decimal and binary numbers that I don't believe can work.
I'm not insisting on total separation. I propose that we start with a requirement that an explicit call be made to a conversion function. These functions would allow a decimal type to be converted to a float or to an int. There would also be conversion function going from a float or an int to a decimal type. What I would like to avoid is creating a decimal type in Python that enables silent errors that are difficult to recognize. Allowing automatic coersion between the binary and decimal types will open the door to errors that would be detected if a conversion is required. If at some point in the future it becomes apparent that a particular form of coersion is safe and useful it could be added. I'd like to move slowly on opening up this potential trouble spot.
I haven't replied to him yet because I can't explain it well enough yet -- but I don't believe there's much of a future in his particular idea.
I guess I'm not understanding something about the direction you are taking Python. As I understood the goals of the CP4E project you were attempting to make Python appealing to a wider audience and make it possible for everyone to learn to write programs. And then there are occasional references to a Python 3k which will fix some Python warts. My proposal moves Python towards these goals, while retaining full backwards compatible. I am not trying to create a new interpreter. I'm trying to make the current interpreter useful to a wider market. What is it you are trying to accomplish in the process of "unifying the numerical types" in Python?
data:image/s3,"s3://crabby-images/83b59/83b59a9397d1ed68e41d5773f0dfbd9baaee4d40" alt=""
On Friday 27 July 2001 06:13 am, M.-A. Lemburg wrote:
Just a suggestion which might also open the door for other numeric type extensions to play along nicely:
Would it make sense to have an extensible registry of constructors for numeric types which maps number literal modifiers to constructors ?
I am thinking of
123L -> long("123") 123i -> int("123") 123.45f -> float("123.45")
With the changes made in the prototype this would be relatively easy to implement. Using an 'i' suffix could be confused with an imaginary number. It would be very easy for someone to mistakenly type 12i instead of 12j and get an integer instead of an imaginary number The next implementation of my PEP will change the 'f' to a 'b', as in binary number. The same suffix is used for both integer and float because they work together as a binary number implementation of numbers. With the decimal number implementation there is only one type for both integer and float. 123b -> int("123") 123.45b -> float("123.45")
The registry would map 'L' to long(), 'i' to int(), 'f' to float() and be extensible in the sense, that e.g. an extension like mxNumber could register its own mappings which would make the types defined in these extensions much more accessible without having to path the interpreter. mxNumber for example could then register 'r' to map to mx.Number.Rational() and a user could then write 1/2r would map to 1 / mx.Number.Rational("2") and generate a Rational number object for 1/2.
The registry would have to be made smart enough to seperate integer notations from floating point ones and use two separate default mapping for these, e.g. '<int>' -> int() and '<float>' -> float().
The tokenizer just passes a number with a suffix as a string to a function in compiler.c The number in the string could be any valid number, e.g. 123, 123.45, 123.45e-3, or .123. The function processing the string then determines what type of number object to create based on the suffix. It would be the responsibility of the function that processes the 'r' suffix to accept or reject the number encoded in the string.
The advantage of such a mechanism would be that a user could easily change the literal semantics at his/her taste.
Note that I don't think that we really need a separate interpreter just to add decimals or rationals to the core. All that is needed is some easy way to construct these number objects without too much programming overhead (i.e. number of keys to hit ;-).
I wasn't suggesting creating a separate interpreter, I was suggesting adding a simple mechanism for allowing a new dialect of Python to be added to the existing interpreter. This new dialect would be easier to use for certain types of programming activities. The use of a decimal number type as the default type in this new language dialect is only one change that was proposed. Another would be to use Unicode as the default character set. This would allow Unicode characters to be in strings without needing to escape them. The proposal also suggests removing the tab character from indentation of blocks. The goal is to create a language that would clean up some of the warts in the Python syntax and take advantage of the capabilities of modern IDE environments. The idea of adding a new language on top of the existing infrastructure isn't that unusual. The gcc compiler can process many languages to produce a common machine dependant object code. I can envision taking my simple changes a few steps further and turning the entire tokenizer into a replaceable unit. This approach would allows projects to build other languages on top of the Python byte code interpreter. Imagine having Javascript, VBasic, or sh tokenizer frontends generating Python bytecodes. Think of it as the pyNET architecture:-) This change probably belongs in Python4k. Perhaps the PEP should be split into two parts. The first PEP would be to add decimal characters with a 'd' suffix and also allow suffix characters to be added to the default float and integer types. I think everyone agrees that this change is needed. The second PEP will cover the proposed creation of the dpython dialect. This PEP would be a container for proposed changes to the Python syntax that would make the language easier to teach to newbies and easier to use in a financial application. Your suggestion to allow additional numerical types to be added by users would be included in the first PEP if the BDFL thinks this is a good idea.
data:image/s3,"s3://crabby-images/3ce65/3ce654d3e7cefe0116a594a13f4c84dce2b4ec49" alt=""
[me]
Note that I don't think that we really need a separate interpreter just to add decimals or rationals to the core. All that is needed is some easy way to construct these number objects without too much programming overhead (i.e. number of keys to hit ;-).
[Michael]
I wasn't suggesting creating a separate interpreter, I was suggesting adding a simple mechanism for allowing a new dialect of Python to be added to the existing interpreter.
Understood. I see no big difference in having two binaries or one binary with a command line option; the two binaries effectively contain the same functionality, just with a different default. I would vote for one binary; if you really think it's too much for your users to say "python -d" instead of "dpython", give them a script. (I know that the -d option currently means something else. That's a detail to worry about later.)
This new dialect would be easier to use for certain types of programming activities. The use of a decimal number type as the default type in this new language dialect is only one change that was proposed.
I'm not very fond of having multiple dialects. There are lots of contexts where the dialect in use is not explicitly mentioned (e.g. when people discuss fragments of Python code).
Another would be to use Unicode as the default character set. This would allow Unicode characters to be in strings without needing to escape them.
That's not a dialect, that's a different input encoding. MAL already has a PEP for that.
The proposal also suggests removing the tab character from indentation of blocks. The goal is to create a language that would clean up some of the warts in the Python syntax and take advantage of the capabilities of modern IDE environments.
What does removing tab characters have to do with decimal numbers? One topic per PEP, please!
The idea of adding a new language on top of the existing infrastructure isn't that unusual. The gcc compiler can process many languages to produce a common machine dependant object code. I can envision taking my simple changes a few steps further and turning the entire tokenizer into a replaceable unit. This approach would allows projects to build other languages on top of the Python byte code interpreter. Imagine having Javascript, VBasic, or sh tokenizer frontends generating Python bytecodes. Think of it as the pyNET architecture:-) This change probably belongs in Python4k.
Or in Python .NET. Decoupling the various part of the parse+compile pipeline is something I've considered. But again this has nothing to do with decimal numbers: your proposal allows the mixing of decimal and binary numbers (as long as one of them uses an explicit base indicator) so you don't really need two parsers -- you need one tokenizer plus a way to specify the default numeric base for literals.
Perhaps the PEP should be split into two parts. The first PEP would be to add decimal characters with a 'd' suffix and also allow suffix characters to be added to the default float and integer types. I think everyone agrees that this change is needed.
It's needed *if* we agree that we need a decimal data type.
The second PEP will cover the proposed creation of the dpython dialect. This PEP would be a container for proposed changes to the Python syntax that would make the language easier to teach to newbies and easier to use in a financial application.
I'll have to go back to your defense of the two dialect approach, but I think it's neither sufficient nor necessary.
Your suggestion to allow additional numerical types to be added by users would be included in the first PEP if the BDFL thinks this is a good idea.
Well, sometimes more generality than you need hurts. I'm not convinced that we need an open-ended set of numeric literals. But in the light of the unified numeric model, we may need ways to make exactness or inexactness explicit, and/or we may need a way to specify rational numbers. If we can fit all of these in the number-with-letter-suffix mold, that would be nice for the lexer, I suppose. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/b37b0/b37b0c2c0378eb2bb3a10c0a60f4d08719115430" alt=""
On Fri, Jul 27, 2001 at 12:35:34PM -0400, Guido van Rossum wrote:
But again this has nothing to do with decimal numbers: your proposal allows the mixing of decimal and binary numbers (as long as one of them uses an explicit base indicator) so you don't really need two parsers -- you need one tokenizer plus a way to specify the default numeric base for literals.
If this were possible, then could it be a per-module decision what "1/2" produces, depending whether unadorned whole-number literals correspond to ClassicInt or NewInt ? That sounds miles better than writing "1//2" to me. Jeff
data:image/s3,"s3://crabby-images/83b59/83b59a9397d1ed68e41d5773f0dfbd9baaee4d40" alt=""
On Friday 27 July 2001 12:35 pm, Guido van Rossum wrote:
[me]
I wasn't suggesting creating a separate interpreter, I was suggesting adding a simple mechanism for allowing a new dialect of Python to be added to the existing interpreter.
Understood. I see no big difference in having two binaries or one binary with a command line option; the two binaries effectively contain the same functionality, just with a different default. I would vote for one binary; if you really think it's too much for your users to say "python -d" instead of "dpython", give them a script. (I know that the -d option currently means something else. That's a detail to worry about later.)
I decided to use a symbolic link to a different command name to set the default encoding of numerical literals. I did this because refer to the 'dpython' command more concise than "python -d". The executable could also have command options to select between python and dpython modes.
I'm not very fond of having multiple dialects. There are lots of contexts where the dialect in use is not explicitly mentioned (e.g. when people discuss fragments of Python code).
I'm not fond of dialects when they don't serve a significant purpose. However, I believe it would be useful to at least discuss creating a special purpose "safe" mode for the Python lexer. This mode would be attractive to newbies and financial programmers. Calling this a new dialect is an overstatement. It is more like defining a subset of the language that uses a special vocabulary for working with decimal types.
Another would be to use Unicode as the default character set. This would allow Unicode characters to be in strings without needing to escape them.
That's not a dialect, that's a different input encoding. MAL already has a PEP for that.
I know about the PEP. I was refering to making it the default string type for a '.dp' file. There would be no prefix 'u' required. I'll remove this and the other unrelated items from the decimal type PEP If you don't agree with the idea of adding dpython lexer mode then there is no point in discussing the features that would be in that mode.
The idea of adding a new language on top of the existing infrastructure isn't that unusual. The gcc compiler can process many languages to produce a common machine dependant object code. I can envision taking my simple changes a few steps further and turning the entire tokenizer into a replaceable unit. This approach would allows projects to build other languages on top of the Python byte code interpreter. Imagine having Javascript, VBasic, or sh tokenizer frontends generating Python bytecodes. Think of it as the pyNET architecture:-) This change probably belongs in Python4k.
Or in Python .NET. Decoupling the various part of the parse+compile pipeline is something I've considered.
Did you decide against it, or has it just not been a high enough priority?
But again this has nothing to do with decimal numbers: your proposal allows the mixing of decimal and binary numbers (as long as one of them uses an explicit base indicator) so you don't really need two parsers -- you need one tokenizer plus a way to specify the default numeric base for literals.
That is exactly what I implemented. The dpython command and the '.dp' cause the Py_USE_DECIMAL_AS_DEFAULT[1] flag to be set. When this flag is set decimal numbers are used for literals.
I'll have to go back to your defense of the two dialect approach, but I think it's neither sufficient nor necessary.
I have mixed too many ideas into a PEP. I'll rework the PEP to remove the cruft and focus on the addition of decimal numbers. I move the other ideas into a separate PEP.
Well, sometimes more generality than you need hurts. I'm not convinced that we need an open-ended set of numeric literals. But in the light of the unified numeric model, we may need ways to make exactness or inexactness explicit, and/or we may need a way to specify rational numbers. If we can fit all of these in the number-with-letter-suffix mold, that would be nice for the lexer, I suppose.
I worry about a "unified numerical model" getting overly complex. I think decimal numbers help because they are a better choice than binary numbers for a significant percentage of all software applications. I know that rationale numbers are imporant in some applications. Am I overlooking some huge class of applications that use rationales? While Tim and some of the other Pythoneers can probably think of dozens of specialized numerical types, I would venture to guess that binary types and a decimal type probably cover 90% of all the user's requirements. [1] I'll be renaming the flat to this in the next version. The flag is currently called Py_NEW_PARSER. I named it that because at one time I was creating a new parser. I trimmed the changes down to just a few edits of the tokenizer and compile.c
data:image/s3,"s3://crabby-images/3ce65/3ce654d3e7cefe0116a594a13f4c84dce2b4ec49" alt=""
[Michael]
I'm not fond of dialects when they don't serve a significant purpose. However, I believe it would be useful to at least discuss creating a special purpose "safe" mode for the Python lexer. This mode would be attractive to newbies and financial programmers. Calling this a new dialect is an overstatement. It is more like defining a subset of the language that uses a special vocabulary for working with decimal types.
Sounds like a dialect to me. But alright, I'll take your word for it. :-) [Michael]
Another would be to use Unicode as the default character set. This would allow Unicode characters to be in strings without needing to escape them.
[Guido]
That's not a dialect, that's a different input encoding. MAL already has a PEP for that.
[Michael]
I know about the PEP. I was refering to making it the default string type for a '.dp' file. There would be no prefix 'u' required.
Have you thourght this through? What would be the input encoding? How do you expect your programmers to edit their Unicode files? Otherwise, the only effect of making all string literals Unicode strings is to break most of the standard library. You can get this effect with "python -U" today. It's not pretty. (That option exists to see how much progress has been made with Python's Unicodification, not for anything very practical.)
I'll remove this and the other unrelated items from the decimal type PEP
It would indeed be better to focus on one idea at a time.
If you don't agree with the idea of adding dpython lexer mode then there is no point in discussing the features that would be in that mode.
Maybe you can rewrite the PEP to explain the idea better. It wasn't very clear the first time.
Or in Python .NET. Decoupling the various part of the parse+compile pipeline is something I've considered.
Did you decide against it, or has it just not been a high enough priority?
It's one of those many "would-be-nice" things that I never get to...
But again this has nothing to do with decimal numbers: your proposal allows the mixing of decimal and binary numbers (as long as one of them uses an explicit base indicator) so you don't really need two parsers -- you need one tokenizer plus a way to specify the default numeric base for literals.
That is exactly what I implemented. The dpython command and the '.dp' cause the Py_USE_DECIMAL_AS_DEFAULT[1] flag to be set. When this flag is set decimal numbers are used for literals.
Where is this flag set? Is it a global variable? If my main program has the .dp extension, does the flag remain set for all other module that it imports?
I'll have to go back to your defense of the two dialect approach, but I think it's neither sufficient nor necessary.
I have mixed too many ideas into a PEP. I'll rework the PEP to remove the cruft and focus on the addition of decimal numbers. I move the other ideas into a separate PEP.
Posterity will be grateful.
Well, sometimes more generality than you need hurts. I'm not convinced that we need an open-ended set of numeric literals. But in the light of the unified numeric model, we may need ways to make exactness or inexactness explicit, and/or we may need a way to specify rational numbers. If we can fit all of these in the number-with-letter-suffix mold, that would be nice for the lexer, I suppose.
I worry about a "unified numerical model" getting overly complex.
Funny. I think that a unified numeric model will take away some complexity from the current model; for example the programmer would no longer have to be aware of the limit on int values, so nobody would have to learn about long any more.
I think decimal numbers help because they are a better choice than binary numbers for a significant percentage of all software applications.
(Just not for most of the apps that are likely to be written in Python today. :-)
I know that rationale numbers are imporant in some applications. Am I overlooking some huge class of applications that use rationales?
I doubt it -- if I was allowed to add exactly *one* numeric type to Python, and I had to choose between decimal and rational, I'd choose decimal. Practicality beats purity.
While Tim and some of the other Pythoneers can probably think of dozens of specialized numerical types, I would venture to guess that binary types and a decimal type probably cover 90% of all the user's requirements.
Add rational, and I'd agree.
[1] I'll be renaming the flat to this in the next version. The flag is currently called Py_NEW_PARSER. I named it that because at one time I was creating a new parser. I trimmed the changes down to just a few edits of the tokenizer and compile.c
Why does a flag variable have an UPPER_CASE name? That normally means the name is a preprocessor symbol. [Next message] [Guido]
But I foresee serious problems. Most standard library modules use numbers. Most of the modules using numbers occasionally use a literal (e.g. 0 or 1). According to your PEP, literals in module files ending with .py default to binary. This means that almost any use of a standard library module from your "dpython" will fail as soon as a literal is used.
[Michael]
No, because the '.py' file will generate bytecodes for a number literals as binary number when the module is compiled. If a '.dp' file imports the contents of a '.py' file the binary numbers will be imported as binary numbers. If the '.dp' file will need to use the binary number in a calculation with a decimal number the binary number will have to be cast it to a decimal number.
I understood all that. but what if the decimal module wants to pass some numbers into a binary module. Then it has to make sure all the arguments it passes are decimal.
--------------------- #gui.py BLUE = 155 x_axis = 1024 y_axis = 768
-------------------- #calculator.dp import gui ytd_interest = 0.04 # ytd_interest is now a decimal number win = gui.open_window(gui.bg, x_size=gui.x_axis, y_size=gui.y_axis) app = win.dialog("Bank Balance", bankbalance_callback) bb = app.get_bankbalance() # bb now contains a string newbalance = decimal(bb) *ytd_interest # now update the display app.set_bankbalance(str(newbalance))
-------------------
In the example the gui module was used in the calculator module, but they were alway handled as binary numbers. The parser did not convert them to decimal numbers because they had been parsed into a gui.pyc file prior to being loaded into calculator.dp.
Blech. That means that whenever you use a library module that does something useful with your data, you have to convert all your data explicitly to binary, even if it's just integers. Yuck. Bah. (Need I say more? OK, one more then. Argh! :-)
I can't believe that this will work satisfactorily.
I think it will. There will be some cases where it might be necessary to add modules of convenience functions to make it easier to to use applications that cross boundaries, but I think these cases will be rare.
I would be much more comfortable if there was just one integer type, or if at least binary ints would mix freely with decimal ints. I see a lot of use for decimal *floating point* (more predictable arithmetic, calculator style), and also a lot of use for decimal *fixed point* (money calculations), but I don't see the need for distinguishing the radix of of integers.
Immediately following the introduction of the decimal number types all binary modules will work as the work today. There will be no additional pain to continue using those module. There will be no decimal modules, so there is no problem with making them work with the binary modules. As decimal module users start developing applications they will develop techniques for working with the binary modules. Initially it may require a significant effort, but eventually bondaries will be created and they two domains will coexists.
You make it sound as if most of the standard library would not be useful for decimal users. I doubt that. Decimal users also need to parse XML, do bisection on lists, use database files, and so on.
Another example of the kind of problem your approach runs into: what should the type of len("abc") be? 3d or 3b? Should it depend on the default mode?
That is an interesting question. With my current proposal the following would be required:
stlen = decimal(len("abc"))
A dlen() function could be added, or perhaps allowing the automatic promotion of int to a decimal would be a reasonable exception. That is one case were there is no chance of data loss. I'm not apposed to automatic conversions if there is no danger of errors being introduced.
OK, then we agree. Let's freely allow mixing decimal and binary integers. That makes much more sense.
I suppose sequence indexing has to accept decimal as well as binary integers as indexes -- certainly in a decimal program you will want to be able to use decimal integers for indexes.
That is how I would expect it to work.
But it contradicts your original assertion that decimal and binary numbers were two incompatible types. Glad we sorted that out. [Next message] [Guido]
I'm not optimistic about Michael's PEP. He seems to insist on a total separation between decimal and binary numbers that I don't believe can work.
[Michael]
I'm not insisting on total separation. I propose that we start with a requirement that an explicit call be made to a conversion function. These functions would allow a decimal type to be converted to a float or to an int. There would also be conversion function going from a float or an int to a decimal type.
(Except for ints, we have now established.)
What I would like to avoid is creating a decimal type in Python that enables silent errors that are difficult to recognize. Allowing automatic coersion between the binary and decimal types will open the door to errors that would be detected if a conversion is required. If at some point in the future it becomes apparent that a particular form of coersion is safe and useful it could be added. I'd like to move slowly on opening up this potential trouble spot.
I recommend that you make a more complete analysis of what errors you want to avoid. Every binary can be represented in decimal if you allow enough digits. On the other hand, if you are thinking of decimal floating point, some decimal calculations will also lose precision. If you never want to lose precision, the radix of the numbers is a red herring, and you might as well use rationals under the covers. If you allow the kind of precision loss that decimal floating point can cause, I would like to understand more about what it *is* that you are trying to avoid with your Draconian separation rule. Floating point decimal arithmetic cannot avoid loss of precision for division (e.g. 1d/3d cannot be represented exactly with a finite number of decimal digits). Fixed point decimal arithmetic isn't any better.
I haven't replied to him yet because I can't explain it well enough yet -- but I don't believe there's much of a future in his particular idea.
I guess I'm not understanding something about the direction you are taking Python. As I understood the goals of the CP4E project you were attempting to make Python appealing to a wider audience and make it possible for everyone to learn to write programs. And then there are occasional references to a Python 3k which will fix some Python warts. My proposal moves Python towards these goals, while retaining full backwards compatible. I am not trying to create a new interpreter.
I think you haven't completely thought through the rules you are proposing, and you haven't stated your underlying goals very clearly. I believe the rules that you *claim* to propose won't further your goals, but it seems that you aren't sure of the rules you propose and maybe you aren't sure of your goal either. Under these adverse circumstances I'm trying to tease out a set of rules that might further the kind of goal I *think* you want to obtain, but it's hard because you have overspecified your "solution".
I'm trying to make the current interpreter useful to a wider market.
Adding an Oracle module to the standard library would probably do more to further that goal than any wrangling with the numeric model that we can carry out here... :-)
What is it you are trying to accomplish in the process of "unifying the numerical types" in Python?
Removing specific warts of the current numeric system that require the programmer to be aware of more details than necessary. We will never be able to remove the need for careful numerical analysis of algorithms involving floating point (be it binary or decimal). But we can certainly remove the need to be aware of the number of bits in a machine word (long/int unification, PEP 237) or the need to explicitly promote ints to floats in certain cases (PEP 238). --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/4d164/4d16445a310078d52ed81cec2cf9669ce2e8295f" alt=""
Michael McLay wrote:
On Friday 27 July 2001 12:35 pm, Guido van Rossum wrote:
I'm not very fond of having multiple dialects. There are lots of contexts where the dialect in use is not explicitly mentioned (e.g. when people discuss fragments of Python code).
I'm not fond of dialects when they don't serve a significant purpose. However, I believe it would be useful to at least discuss creating a special purpose "safe" mode for the Python lexer. This mode would be attractive to newbies and financial programmers. Calling this a new dialect is an overstatement. It is more like defining a subset of the language that uses a special vocabulary for working with decimal types.
I don't know nothin about no number theory, but I did use a simliar dialect technique to implement a PEP 245 prototype using mobius. Like what I've read so far about dpython, it's objects from *.pyi files (a superset of python) could be easily intermigled with objects from *.py files. I'm all for no dialects at large, but some people may find need to implement new languages on top of python's run time engine. Especially people embedding python into specialized applications. Mobius was a way to control the python language using the language itself, it would be cool to have this kind of thing stock in python. -Michel
data:image/s3,"s3://crabby-images/65a20/65a208f9b81553487a1be8a613e2e9d9e742f8d3" alt=""
Michael McLay wrote:
Absolutely. The PEP process is suppose to formalize the capture of ideas so they can be reference. This PEP is mostly orthogonal to Aahz's proposal. They can be merge, or we can reference each others PEP. I'm probably not the best choice for doing the implement of the decimal number semantics, so I'd be happy to work with Aahz.
Note that I am unwilling to discuss this in the context of any PEP until/unless I finish my implementation. There is already a spec for what I'm doing (Cowlishaw), and I see no point in talking until code is ready for use. If someone wants to take over my work, I won't complain; I've already done the easy work. ;-) -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine.
participants (7)
-
aahz@rahul.net
-
Guido van Rossum
-
Jeff Epler
-
M.-A. Lemburg
-
Michael McLay
-
Michael McLay
-
Michel Pelletier