non-Pre-PEP: Syntactic replacement of built-in types with user-defined callables (Take 2... fewer newlines!)

Hi all... again, Built-in types such as float, string, or list are first-class citizens in Python sourcefiles, having syntactic support: myfloat = 1.0 mystring = "my string" mylist = [1,2,4,8] mydict = {1:"a", 2:"b", 3:"c"} myset = {1,2,3} User-defined classes are second-class citizens, requiring data to be manually converted from a type: mydecimal = Decimal("1.00000000000000001") myrope = Rope("my rope") myblist = BList([1,2,4,8]) myordereddict = OrderedDict((1,"a"), (2, "b"), (3, "c")) myfrozenset = frozenset([1,2,3]) If there's only one or two conversions needed in a file, then such conversion is not particularly burdensome, but if one wants to consistently use (say) decimals throughout a file then the ability to use a literal syntax makes for prettier source. Some languages have open types, allowing the addition of methods to built-in types. This is not considered desired behaviour for Python, since modifications made in one module can potentially affect code in other modules. A typedef is syntactic sugar to allow user-defined replacements to be treated as first-class citizens in code. They affect only the module in which they appear and do not modify the original type. To be typedeffable, something must be a builtin type, have a constant/syntactic representation, and be callable. Hence, typedeffable types would be limited to complex, dict, float, int, ?object?, list, slice, set, string, and tuple. No modification is made to __builtins__ or types, so conversion and/or reference to the original type is still possible. The syntax for a typedef is: from MODULE typedef ADAPTOR as TYPE OR typedef BUILTINADAPTOR as TYPE Syntactic constants of a given type are then wrapped with a call: ADAPTOR(SOURCELITERAL) where SOURCELITERAL is the string that appears in the sourcecode eg: from decimal typedef Decimal as float i = 1.000000000000000000000000000001 translates as: from decimal import Decimal as float i = Decimal("1.000000000000000000000000000001") Syntactic collections of a given type are always provided with a list of objects, eg: from decimal typedef Decimal as float from blist typedef BList as list b = [1.1, 4.2] translates as: from decimal import Decimal as float from blist import BList as list b = Blist([Decimal("1.1"), Decimal("4.2")]) and from collections typedef OrderedDict as dict d = {1:"a", 2:"b", 3:"c"} as: from collections import OrderedDict as dict d = OrderedDict([(1,"a"), (2,"b"), (3,"c")]) A typedef appears at the start of a module immediately after any __future__ imports. As no adaptors can be defined in a module before a typedef and typedefs are in no way a forward declaration, "typedef ADAPTOR as TYPE" only works for builtins, since to do otherwise would lead to one of two unpalatable options; either: a/ definition of adaptors would have to be allowed pre-typedef, which would allow them to be buried in code, making them far easier to miss; or b/ adaptors would be defined after the typedef, which means that you'd have to handle: typedef float as float def float(): pass or: typedef MyObject as object class MyObject(object): pass or: typedef myfloat as float x = 1.1 def myfloat(): pass It is true that if a valid typedef is made, the type can be redefined within the module; but the "consenting adults" rule applies -- it's possible to redefine str and float multiple times within a module as well, but that isn't recommended either (and the typedef at the top of the module at least indicates that non-standard behaviour is to be expected) It is a SyntaxError to typedef the same type more than once: from decimal typedef Decimal as float from types typedef FloatType as float #SyntaxError("Type 'float' already redefined.") Spelling: "typedef" is prettier than "pragma", and less likely to be in use than "use" but its behaviour is somewhat different from C's typedef, so perhaps another term may be preferred. Theoretical Performance Differences: Since a typedef is purely syntactic sugar, and all tranformational work would be done at compilation, running code should be no slower (there should be no new opcodes necessary) and by default no faster that performing manual conversion (though they may assist an optimisation). Unless optimisation magic is available, performance-critical code should be careful when typedeffing not to use typedeffed literals in inner loops. I know it's considered polite to provide code, but any implementation would have to be in C, so please accept this extremely fake dummy implementation which in no way resembles the way things really work as a poor substitute: typedefs = { FLOATTYPE: None, INTTYPE: None, LISTTYPE: None, DICTTYPE: None, ... } while True line = lines.next() type, module, adaptor = parsetypedef(line) if type is None: break if typedefs[type] is not None: raise SyntaxError("Typedef redef") typedefs[type] = adaptor if module is not None: emit_bytecode_for_import_from(type, module, adaptor) else: emit_bytecode_for_assignment(type, adaptor) parse([line] + lines) ... def emit_float(...): if typedefs[FLOATTYPE] is not None: emit_constant(typedefs[FLOATTYPE][0], stringliteral) else: ... # standard behaviour def emit_list(...): if typedefs[LISTTYPE] is not None: emit(LOADGLOBAL, typedefs[LISTTYPE]) # standard behaviour if typedefs[LISTTYPE] is not None: emit(CALL_FUNCTION) # standard behaviour All rights are assigned to the Python Software Foundation

Taro wrote:
The problem I see with this is, what if you need to use both decimals and floats together? I've often thought that there should be a shorter spelling for decimal numbers, but I was thinking that a simple suffix letter would suffice: mydecimal = 1.0000000000000001d (This assumes of course that the compiler knows how to form a decimal constant, although the actual construction of the constant could be deferred until runtime.) And if you really are suffering repetitive strain injury from having to type 'OrderedDict' 100 times in your code, it seems to me that you could just avoid creating each one individually, and instead have an array of inputs which gets converted to an array of OrderedDict. In other words, one generally doesn't see code where the same repeated item is assigned to 100 individual variables; Most programmers, when they see more than half a dozen similar items, will start thinking about ways in which they can roll up the definitions into an algorithm to generate them. -- Talin

On 1/28/08, Talin <talin@acm.org> wrote:
Taro wrote:
Built-in types such as float, string, or list are first-class citizens ...
myfloat = 1.0
The problem I see with this is, what if you need to use both decimals and floats together?
Then don't overwrite the float type. :D
I've often thought that there should be a shorter spelling for decimal numbers, but I was thinking that a simple suffix letter would suffice:
mydecimal = 1.0000000000000001d
That solves a slightly different problem. It expands the set of known types to include Decimal, but it doesn't let you say: For this run, make all strings be instances of MyString, which is a string subclass with extra logging to help me debug.
When I have wanted this, I didn't have an array; I had existing code with string literals sprinkled throughout. I wanted to minimize (diff-visible) changes because I knew I would back them out once the debugging or testing were done. -jJ

Talin, hi On Jan 29, 2008 2:44 PM, Talin <talin@acm.org> wrote:
If for some reason you need exactly even quantities of decimals and floats then at least you're no worse off than you are now -- just don't typedef.
-T. (*eye of the beholder blah blah blah ;-)

Taro writes:
As a matter of presenting the PEP in the best light, this isn't an issue of "second-class" to me, for one. ISTM that it's simply a matter of (a) giving syntax to constructions that are used a lot in control structures, and (b) the historical fact that other types that have syntax defined for them came first and were added as built-ins. Some of them possibly shouldn't have syntax. As a strawman example, many Python programs use floats very rarely, and we could conceivably -- but not *plausibly* -- demote float and require "my_approximate_pi = float('3.14')".
This is very persuasive, but may not be enough. Jim's suggestion of a "logging string" was also interesting.
The syntax for a typedef is: from MODULE typedef ADAPTOR as TYPE
I'm sorry, but I hate this term "typedef". This is not a "type definition" in any Pythonic sense of the word "type", and the associations with C-style typedefs are painful. How about from MODULE import STRINGCONVERTER with TYPE readsyntax or from MODULE import STRINGCONVERTER with readsyntax TYPE or from MODULE import STRINGCONVERTER readsyntax TYPE ?
a/ definition of adaptors would have to be allowed pre-typedef, which would allow them to be buried in code, making them far easier to miss;
I don't see why this is a problem. An adaptor for a scalar TYPE is just a converter from string to that TYPE. If you have a such a converter (eg, because you're using Python as the platform for translating another language), why not just use it here? Since the way this would work is that each TYPE would have a string-to-TYPE-value converter, you would just do (inside the compiler) TYPE.stringconverter = STRINGCONVERTER and if the compiler encountered a literal with an undefined .stringconverter, it would generate an error to the effect of Use of TYPE stringconverter 'STRINGCONVERTER' before definition. I think this would not be a problem in practice (except for typos) because you'd write decimal.py like this: ### decimal.py --- class Decimal for multiprecision decimal arithmetic from decimal import string_to_decimal with float readsyntax class Decimal: ... def string_to_Decimal (astring): ... # Decimal constants pi = 3.1415926... ### decimal.py ends here Other comments: One problem I see with this proposal is that for non-scalar TYPEs, in general a user may want to take over the whole process of parsing TYPE literals. What I don't see in your proposal is what restrictions you want to put on that. It seems like what you have in mind for a MyList is to convert from a list, but what if in your code you mostly want floats to be floats, but in MyLists they should be Decimals? This would mean loss of precision: string -> list of floats -> MyList of Decimals Don't take the example too seriously, we can discuss later whether there would be real use cases like this. What I want to know is how you see this case working. Take over parsing the whole string? Or let the list type parse the string, and then convert? In the latter case, I think EIBTI applies. The former seems rather unlikely to fly as a PEP.
I know it's considered polite to provide code, but any implementation would have to be in C,
No, there's PyPy.
You might as well omit this; it doesn't really even help answer my question above. (IANALL but ISTM that) an issue here is at what point does the compiler learn that a syntactic list is actually a literal, and your code doesn't help indicate that, or whether it would need to differ from the current compiler, either.

Stephen, hi, and thanks for the critique. On Jan 30, 2008 8:59 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
If I understand your intent correctly, as things stand this would ordinarily be a runtime NameError, and extra code would need to be added to the compiler to keep track of that: from decimal import Decimal with replacingsyntax float as float x = 1.01 # SyntaxError here??? def Decimal():....
class MyList(list): def __init__(self, inlist): for elem in inlist: try: elem = Decimal(elem._strvalue) except AttributeError: pass self.append(elem) ## main.py from mycollections import MyListFloatHelper with readsyntax float from mycollections import MyList with readsyntax list def foo(): myfloat = 1.000000000000000000000000000001 mydlist = [1.0000000000000000000000000001] mydecimal = mydlist[0] print myfloat,"/", mydecimal #==> 1.0 / Decimal("1.0000000000000000000000000001") print type(myfloat), "/", type(mydecimal) #==> <class 'mycollections.MyListFloatHelper'> / <class 'decimal.Decimal'>
It would learn that a token is a literal at the same time it does now. Determination of the need for literal replacements needs to be made at the compilation stage so that stringliterals can be provided and extra opcodes emitted (?an AST Visitor would be too late since the original literals are already converted?) def foo(): myfloat = 1.000000000000000000000000000001 mydecimallist = [1.0000000000000000000000000001] mydecimal = mydecimallist[0] dis.dis(foo) # Currently #==> 2 0 LOAD_CONST 1 (1.0) #==> 3 STORE_FAST 0 (myfloat) #==> #==> 3 6 LOAD_CONST 1 (1.0) #==> 9 BUILD_LIST 1 #==> 12 STORE_FAST 1 (mydecimallist) #==> #==> 4 15 LOAD_FAST 1 (mydecimallist) #==> 18 LOAD_CONST 2 (0) #==> 21 BINARY_SUBSCR #==> 22 STORE_FAST 2 (mydecimal) #==> 25 LOAD_CONST 0 (None) #==> 28 RETURN_VALUE dis.dis(foo) # With MyListFloatHelper and MyList #==> 2 0 LOAD_GLOBAL 0 (MyListFloatHelper) #==> 3 LOAD_CONST 1 ('1.0000000000000000000000000001') #==> 6 CALL_FUNCTION 1 #==> 9 STORE_FAST 0 (myfloat) #==> #==> 3 12 LOAD_GLOBAL 1 (MyList) #==> 15 LOAD_GLOBAL 0 (MyListFloatHelper) #==> 18 LOAD_CONST 1 ('1.0000000000000000000000000001') #==> 21 CALL_FUNCTION 1 #==> 24 BUILD_LIST 1 #==> 27 CALL_FUNCTION 1 #==> 30 STORE_FAST 1 (mydecimallist) #==> #==> 4 33 LOAD_FAST 1 (mydecimallist) #==> 36 LOAD_CONST 2 (0) #==> 39 BINARY_SUBSCR #==> 40 STORE_FAST 2 (mydecimal) #==> 43 LOAD_CONST 0 (None) #==> 46 RETURN_VALUE

Taro writes:
That is exactly my intent. A more explicit error message than "NameError: name 'myTypeLexer' not defined" would be nice, although if you give the function a reasonably explicit name it should be clear enough. However, a real problem with this idea is that I don't know if the compiler knows how to call the functions it has just compiled! Also, as syntax in general this kind of order dependence would be considered unPythonic, I suspect.
No, it would be a NameError, the name of the implicitly called converter is not defined in scope. The syntax is correct.
For this, the "solution" to keep precision would be along the lines of:
You've missed the point. I want myfloat to be a Python builtin float for some reason. Only in the context of a dlist do I want that read syntax to be parsed as a Decimal. You can just require that literal read syntax replacements be global, I guess, but you need to say something about whether this is going to be possible in the proposal.

"Taro" <taroso@gmail.com> wrote in message news:fefcdfb70801272201y10d57ebdo39d559ca5bc6e8a2@mail.gmail.com... | Theoretical Performance Differences: Since a typedef is purely | syntactic sugar, and all tranformational work would be done at | compilation, I think this is a key issue which, unfortunately, I believe, works against your proposal. If the alternative constructor is a builtin C function, then yes, the transformation could be done at compile time. But I think this much more difficult for one written in Python. As far as I know, the byte-code interpreter is not normally running during compilation. From __future__ imports work precisely because they are not really imports. But alternate constructor imports would have to be. I don't know enough, though, to know if this could be made to work. tjr

On Mon, 28 Jan 2008 17:01:19 +1100, Taro wrote: [typedef PEP snipped] If I understood your proposal correctly, it's just about dynamically *replacing* the built-in types (with their respective syntax) with some type of your own. I doubt this will ever be accepted (apart from the technical difficulties) since *changing* built-in types *in-place* has been disallowed for exactly the same reason: It could cause weird behaviour where you expected some **agreed on, built-in** behaviour without having an overly obvious cause. Cheers,

Taro wrote:
The problem I see with this is, what if you need to use both decimals and floats together? I've often thought that there should be a shorter spelling for decimal numbers, but I was thinking that a simple suffix letter would suffice: mydecimal = 1.0000000000000001d (This assumes of course that the compiler knows how to form a decimal constant, although the actual construction of the constant could be deferred until runtime.) And if you really are suffering repetitive strain injury from having to type 'OrderedDict' 100 times in your code, it seems to me that you could just avoid creating each one individually, and instead have an array of inputs which gets converted to an array of OrderedDict. In other words, one generally doesn't see code where the same repeated item is assigned to 100 individual variables; Most programmers, when they see more than half a dozen similar items, will start thinking about ways in which they can roll up the definitions into an algorithm to generate them. -- Talin

On 1/28/08, Talin <talin@acm.org> wrote:
Taro wrote:
Built-in types such as float, string, or list are first-class citizens ...
myfloat = 1.0
The problem I see with this is, what if you need to use both decimals and floats together?
Then don't overwrite the float type. :D
I've often thought that there should be a shorter spelling for decimal numbers, but I was thinking that a simple suffix letter would suffice:
mydecimal = 1.0000000000000001d
That solves a slightly different problem. It expands the set of known types to include Decimal, but it doesn't let you say: For this run, make all strings be instances of MyString, which is a string subclass with extra logging to help me debug.
When I have wanted this, I didn't have an array; I had existing code with string literals sprinkled throughout. I wanted to minimize (diff-visible) changes because I knew I would back them out once the debugging or testing were done. -jJ

Talin, hi On Jan 29, 2008 2:44 PM, Talin <talin@acm.org> wrote:
If for some reason you need exactly even quantities of decimals and floats then at least you're no worse off than you are now -- just don't typedef.
-T. (*eye of the beholder blah blah blah ;-)

Taro writes:
As a matter of presenting the PEP in the best light, this isn't an issue of "second-class" to me, for one. ISTM that it's simply a matter of (a) giving syntax to constructions that are used a lot in control structures, and (b) the historical fact that other types that have syntax defined for them came first and were added as built-ins. Some of them possibly shouldn't have syntax. As a strawman example, many Python programs use floats very rarely, and we could conceivably -- but not *plausibly* -- demote float and require "my_approximate_pi = float('3.14')".
This is very persuasive, but may not be enough. Jim's suggestion of a "logging string" was also interesting.
The syntax for a typedef is: from MODULE typedef ADAPTOR as TYPE
I'm sorry, but I hate this term "typedef". This is not a "type definition" in any Pythonic sense of the word "type", and the associations with C-style typedefs are painful. How about from MODULE import STRINGCONVERTER with TYPE readsyntax or from MODULE import STRINGCONVERTER with readsyntax TYPE or from MODULE import STRINGCONVERTER readsyntax TYPE ?
a/ definition of adaptors would have to be allowed pre-typedef, which would allow them to be buried in code, making them far easier to miss;
I don't see why this is a problem. An adaptor for a scalar TYPE is just a converter from string to that TYPE. If you have a such a converter (eg, because you're using Python as the platform for translating another language), why not just use it here? Since the way this would work is that each TYPE would have a string-to-TYPE-value converter, you would just do (inside the compiler) TYPE.stringconverter = STRINGCONVERTER and if the compiler encountered a literal with an undefined .stringconverter, it would generate an error to the effect of Use of TYPE stringconverter 'STRINGCONVERTER' before definition. I think this would not be a problem in practice (except for typos) because you'd write decimal.py like this: ### decimal.py --- class Decimal for multiprecision decimal arithmetic from decimal import string_to_decimal with float readsyntax class Decimal: ... def string_to_Decimal (astring): ... # Decimal constants pi = 3.1415926... ### decimal.py ends here Other comments: One problem I see with this proposal is that for non-scalar TYPEs, in general a user may want to take over the whole process of parsing TYPE literals. What I don't see in your proposal is what restrictions you want to put on that. It seems like what you have in mind for a MyList is to convert from a list, but what if in your code you mostly want floats to be floats, but in MyLists they should be Decimals? This would mean loss of precision: string -> list of floats -> MyList of Decimals Don't take the example too seriously, we can discuss later whether there would be real use cases like this. What I want to know is how you see this case working. Take over parsing the whole string? Or let the list type parse the string, and then convert? In the latter case, I think EIBTI applies. The former seems rather unlikely to fly as a PEP.
I know it's considered polite to provide code, but any implementation would have to be in C,
No, there's PyPy.
You might as well omit this; it doesn't really even help answer my question above. (IANALL but ISTM that) an issue here is at what point does the compiler learn that a syntactic list is actually a literal, and your code doesn't help indicate that, or whether it would need to differ from the current compiler, either.

Stephen, hi, and thanks for the critique. On Jan 30, 2008 8:59 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
If I understand your intent correctly, as things stand this would ordinarily be a runtime NameError, and extra code would need to be added to the compiler to keep track of that: from decimal import Decimal with replacingsyntax float as float x = 1.01 # SyntaxError here??? def Decimal():....
class MyList(list): def __init__(self, inlist): for elem in inlist: try: elem = Decimal(elem._strvalue) except AttributeError: pass self.append(elem) ## main.py from mycollections import MyListFloatHelper with readsyntax float from mycollections import MyList with readsyntax list def foo(): myfloat = 1.000000000000000000000000000001 mydlist = [1.0000000000000000000000000001] mydecimal = mydlist[0] print myfloat,"/", mydecimal #==> 1.0 / Decimal("1.0000000000000000000000000001") print type(myfloat), "/", type(mydecimal) #==> <class 'mycollections.MyListFloatHelper'> / <class 'decimal.Decimal'>
It would learn that a token is a literal at the same time it does now. Determination of the need for literal replacements needs to be made at the compilation stage so that stringliterals can be provided and extra opcodes emitted (?an AST Visitor would be too late since the original literals are already converted?) def foo(): myfloat = 1.000000000000000000000000000001 mydecimallist = [1.0000000000000000000000000001] mydecimal = mydecimallist[0] dis.dis(foo) # Currently #==> 2 0 LOAD_CONST 1 (1.0) #==> 3 STORE_FAST 0 (myfloat) #==> #==> 3 6 LOAD_CONST 1 (1.0) #==> 9 BUILD_LIST 1 #==> 12 STORE_FAST 1 (mydecimallist) #==> #==> 4 15 LOAD_FAST 1 (mydecimallist) #==> 18 LOAD_CONST 2 (0) #==> 21 BINARY_SUBSCR #==> 22 STORE_FAST 2 (mydecimal) #==> 25 LOAD_CONST 0 (None) #==> 28 RETURN_VALUE dis.dis(foo) # With MyListFloatHelper and MyList #==> 2 0 LOAD_GLOBAL 0 (MyListFloatHelper) #==> 3 LOAD_CONST 1 ('1.0000000000000000000000000001') #==> 6 CALL_FUNCTION 1 #==> 9 STORE_FAST 0 (myfloat) #==> #==> 3 12 LOAD_GLOBAL 1 (MyList) #==> 15 LOAD_GLOBAL 0 (MyListFloatHelper) #==> 18 LOAD_CONST 1 ('1.0000000000000000000000000001') #==> 21 CALL_FUNCTION 1 #==> 24 BUILD_LIST 1 #==> 27 CALL_FUNCTION 1 #==> 30 STORE_FAST 1 (mydecimallist) #==> #==> 4 33 LOAD_FAST 1 (mydecimallist) #==> 36 LOAD_CONST 2 (0) #==> 39 BINARY_SUBSCR #==> 40 STORE_FAST 2 (mydecimal) #==> 43 LOAD_CONST 0 (None) #==> 46 RETURN_VALUE

Taro writes:
That is exactly my intent. A more explicit error message than "NameError: name 'myTypeLexer' not defined" would be nice, although if you give the function a reasonably explicit name it should be clear enough. However, a real problem with this idea is that I don't know if the compiler knows how to call the functions it has just compiled! Also, as syntax in general this kind of order dependence would be considered unPythonic, I suspect.
No, it would be a NameError, the name of the implicitly called converter is not defined in scope. The syntax is correct.
For this, the "solution" to keep precision would be along the lines of:
You've missed the point. I want myfloat to be a Python builtin float for some reason. Only in the context of a dlist do I want that read syntax to be parsed as a Decimal. You can just require that literal read syntax replacements be global, I guess, but you need to say something about whether this is going to be possible in the proposal.

"Taro" <taroso@gmail.com> wrote in message news:fefcdfb70801272201y10d57ebdo39d559ca5bc6e8a2@mail.gmail.com... | Theoretical Performance Differences: Since a typedef is purely | syntactic sugar, and all tranformational work would be done at | compilation, I think this is a key issue which, unfortunately, I believe, works against your proposal. If the alternative constructor is a builtin C function, then yes, the transformation could be done at compile time. But I think this much more difficult for one written in Python. As far as I know, the byte-code interpreter is not normally running during compilation. From __future__ imports work precisely because they are not really imports. But alternate constructor imports would have to be. I don't know enough, though, to know if this could be made to work. tjr

On Mon, 28 Jan 2008 17:01:19 +1100, Taro wrote: [typedef PEP snipped] If I understood your proposal correctly, it's just about dynamically *replacing* the built-in types (with their respective syntax) with some type of your own. I doubt this will ever be accepted (apart from the technical difficulties) since *changing* built-in types *in-place* has been disallowed for exactly the same reason: It could cause weird behaviour where you expected some **agreed on, built-in** behaviour without having an overly obvious cause. Cheers,
participants (6)
-
Jim Jewett
-
Stargaming
-
Stephen J. Turnbull
-
Talin
-
Taro
-
Terry Reedy