Mailman 3 User-defined literals - Python-ideas

User-defined literals

Andrew Barnert

June 2, 2015

9:03 p.m.

This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++. In the thread on decimals, a number of people suggested that they'd like to have decimal literals. Nick Coghlan explained why decimal.Decimal literals don't make sense in general (primarily, but not solely, because they're inherently context-sensitive), so unless we first add a fixed type like decimal64, that idea is a non-starter. However, there was some interest in either having Swift-style convertible literals or C++-style user-defined literals. Either one would allow users who want decimal literals for a particular app where it makes sense (because there's a single fixed context, and the performance cost of Decimal('1.2') vs. a real constant is irrelevant) to add them without too much hassle or hackery. I explored the convertible literals a while ago, and I'm pretty sure that doesn't work in a duck-typed language. But the C++ design does work, as long as you're willing to have the conversion (including the lookup of the conversion function itself) done at runtime. Any number or string token followed by a name (identifier) token is currently illegal. This would change so that, if there's no whitespace between them, it's legal, and equivalent to a call to a function named `literal_{name}({number-or-string})`. For example, `1.2d` becomes `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also becomes `literal_d('1.2')`. Of course `0x12decimal` becomes `literal_imal('0x12dec')`, and `21jump` becomes `literal_ump('21j'), which are not at all useful, and potentially confusing, but I don't think that would be a serious problem in practice. Unlike C++, the lookup of that literal function happens at runtime, so `1.2z3` is no longer a SyntaxError, but a NameError on `literal_z3`. Also, this means `literal_d` has to be in scope in every module you want decimal literals in, which often means a `from … import` (or something worse, like monkeypatching builtins). C++ doesn't have that problem because of argument-dependent lookup, but that doesn't work for any other language. I think this is the biggest flaw in the proposal. Also unlike C++, there's no overloading on different kinds of literals; the conversion function has no way of knowing whether the user actually typed a string or a number. This could easily be changed (e.g., by using different names, or just by passing the repr of the string instead of the string itself), but I don't think it's necessary. Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions. I've built a quick&dirty toy implementation (at https://github.com/abarnert/userliteralhack). Unlike the real proposal, this only handles numbers, and allows whitespace between the numbers and the names, and is a terrible hack. But it's enough to play with the idea, and you don't need to patch and recompile CPython to use it. My feeling is that this would be useful, but the problems are not surmountable without much bigger changes, and there's no obvious better design that avoids them. But I'm interested to see what others think.

Show replies by date

Florian Bruhin

June 2015

9:33 p.m.

* Andrew Barnert via Python-ideas <python-ideas@python.org> [2015-06-02 12:03:25 -0700]:

...

This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++.

I actually had the exact same thing in mind recently, and never brought it up because it seemed too crazy to me. It seems I'm not the only one! :D

...

Any number or string token followed by a name (identifier) token is currently illegal. This would change so that, if there's no whitespace between them, it's legal, and equivalent to a call to a function named `literal_{name}({number-or-string})`. For example, `1.2d` becomes `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also becomes `literal_d('1.2')`.

I think a big issue is that it's non-obvious syntactic sugar. You wouldn't expect 1.2x to actually be a function call, and for newcomers this might be rather confusing...

...

Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.

That actually was the use-case I had in mind. I think {'spam': 1, 'eggs': 2}_o is less ugly (and less error-prone!) than OrderedDict([('spam', 1), ('eggs': 2)]) Also, it's immediately apparent that it is some kind of dict.

...

I've built a quick&dirty toy implementation (at https://github.com/abarnert/userliteralhack). Unlike the real proposal, this only handles numbers, and allows whitespace between the numbers and the names, and is a terrible hack. But it's enough to play with the idea, and you don't need to patch and recompile CPython to use it.

Wow! I'm always amazed at how malleable Python is. Florian -- http://www.the-compiler.org | me@the-compiler.org (Mail/XMPP) GPG: 916E B0C8 FD55 A072 | http://the-compiler.org/pubkey.asc I love long mails! | http://email.is-not-s.ms/

Andrew Barnert

2:36 a.m.

On Jun 2, 2015, at 12:33, Florian Bruhin <me@the-compiler.org> wrote:

...

* Andrew Barnert via Python-ideas <python-ideas@python.org> [2015-06-02 12:03:25 -0700]:

...
This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++.

I actually had the exact same thing in mind recently, and never brought it up because it seemed too crazy to me. It seems I'm not the only one! :D

...
Any number or string token followed by a name (identifier) token is currently illegal. This would change so that, if there's no whitespace between them, it's legal, and equivalent to a call to a function named `literal_{name}({number-or-string})`. For example, `1.2d` becomes `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also becomes `literal_d('1.2')`.

I think a big issue is that it's non-obvious syntactic sugar. You wouldn't expect 1.2x to actually be a function call, and for newcomers this might be rather confusing...

Well, newcomers won't be creating user-defined literals, so they won't have to even know there's a function call (unless whoever wrote the library that supplies them has a bug).

...

...
Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.

That actually was the use-case I had in mind. I think

{'spam': 1, 'eggs': 2}_o

is less ugly (and less error-prone!) than

OrderedDict([('spam', 1), ('eggs': 2)])

Well, I suppose that's one advantage of the literals being user-defined: you can use _o in your project, and I can not use it. :) But you still have to deal with the other issue I mentioned if you want to extend it to collection literals: again, they aren't really literals, or even easy to define except as "displays that aren't comprehensions". A quick hack like this is actually pretty easy to write (especially because in a quick hack, who cares whether using it on a comprehension gives the wrong error, or accidentally "works"); a real design and implementation may be harder.

...

Also, it's immediately apparent that it is some kind of dict.

That is a good point. Not that it isn't immediately apparent that OrderedDict(…) is some kind of dict as well... But compared to Swift using ArrayLiteralConvertible to define sets or C++ using array-like initializer lists to do the same thing, this is definitely not as bad.

Chris Angelico

9:40 p.m.

On Wed, Jun 3, 2015 at 5:03 AM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:

...

Of course `0x12decimal` becomes `literal_imal('0x12dec')`, and `21jump` becomes `literal_ump('21j'), which are not at all useful, and potentially confusing, but I don't think that would be a serious problem in practice.

There's probably no solution to the literal_imal problem, but the easiest fix for literal_ump is to have 21j be parsed the same way - it's a 21 modified by j, same as 21jump is a 21 modified by jump.

...

Unlike C++, the lookup of that literal function happens at runtime, so `1.2z3` is no longer a SyntaxError, but a NameError on `literal_z3`. Also, this means `literal_d` has to be in scope in every module you want decimal literals in, which often means a `from … import` (or something worse, like monkeypatching builtins). C++ doesn't have that problem because of argument-dependent lookup, but that doesn't work for any other language. I think this is the biggest flaw in the proposal.

I'd much rather see it be done at compile time. Something like this: compile("x = 1d/10", "<>", "exec") would immediately call literal_d("1") and embed its return value in the resulting code as a literal. (Since the peephole optimizer presumably doesn't currently understand Decimals, this would probably keep the division, but if it got enhanced, this could end up constant-folding to Decimal("0.1") before returning the code object.) So it's only the compilation step that needs to know about all those literal_* functions. Should there be a way to globally register them for default usage, or is this too much action-at-a-distance?

...

Also unlike C++, there's no overloading on different kinds of literals; the conversion function has no way of knowing whether the user actually typed a string or a number. This could easily be changed (e.g., by using different names, or just by passing the repr of the string instead of the string itself), but I don't think it's necessary.

I'd be inclined to simply always provide a string. The special case would be that the quotes can sometimes be omitted, same as redundant parens on genexprs can sometimes be omitted. Otherwise, 1.2d might still produce wrong results.

...

Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.

I thought there was no such thing as a dict/list/set literal, only display syntax? In any case, that can always be left for a future extension to the proposal. ChrisA

Terry Reedy

2:05 a.m.

On 6/2/2015 3:40 PM, Chris Angelico wrote:

...

On Wed, Jun 3, 2015 at 5:03 AM, Andrew Barnert via Python-ideas

...

Correct. Only number and string literals. Displays are atomic runtime expressions. 'expression_list' and 'comprehension' are alternate contents of a display. 6.2.4. Displays for lists, sets and dictionaries -- Terry Jan Reedy

Andrew Barnert

2:47 a.m.

On Jun 2, 2015, at 12:40, Chris Angelico <rosuav@gmail.com> wrote:

...

Thanks; I should have thought of that--especially since that's exactly how C++ solves similar problems. (Although reserving all suffixes that don't start with an underscore for the implementation's use doesn't hurt...)

...

It would definitely be nicer to have it done at compile time if possible. I'm just not sure there's a good design that makes it possible. In particular, with your suggestion (which I considered), it seems a bit opaque to me that 1.2d is an error unless you _or some other module_ first imported decimalliterals; it's definitely more explicit if you (not some other module) have to from decimalliterals import literal_d. (And when you really want to be implicit, you can inject it into other modules or into builtins, the same as any other rare case where you really want to be implicit.) But many real projects are either complex enough to need centralized organization or simple enough to fit in one script, so maybe it wouldn't turn out too "magical" in practice.

...

Yes, that's what I thought too. The only real use case C++ has for this is allowing the same suffix to mean different things for different types, which I think would be more of a bug magnet than a feature if anyone actually did it...

...

That's what I meant in the last sentence: technically, there's no such thing as a dict literal, just a dict display that isn't a comprehension. I don't think you want user-defined suffixes on comprehensions, and coming up with a principled and simply-implementable way to make them work on literal-type displays but not comprehension-type displays doesn't seem like an easy problem.

...

Chris Angelico

3:05 a.m.

On Wed, Jun 3, 2015 at 10:47 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

...

Yeah. The significance is that literals get snapshotted into the code object as constants and simply called up when they're needed, but displays are executable code:

...

My understanding of "literal" is something which can be processed entirely at compile time, and retained in the code object, just like strings are. Once the code's finished being compiled, there's no record of what type of string literal was used (raw, triple-quoted, etc), only the type of string object (bytes/unicode). Custom literals could be the same - come to think of it, it might be nice to have pathlib.Path literals, represented as p"/home/rosuav" or something. In any case, they'd be evaluated using only compile-time information, and would then be saved as constants. That implies that only immutables should have literal syntaxes. I'm not sure whether that's significant or not. ChrisA

Andrew Barnert

3:56 a.m.

On Jun 2, 2015, at 18:05, Chris Angelico <rosuav@gmail.com> wrote:

...

The problem is that Python doesn't really define what it means by "literal" anywhere, and the documentation is not consistent. There are at least two places (not counting tutorial and howtos) that Python 3.4 refers to list or dict literals. (That's not based on a search; someone wrote a StackOverflow question asking what those two places meant.) Which I don't actually think is much of a problem. It means that in cases like this proposal, you have to be explicit about exactly what you mean by "literal" because Python doesn't do it for you. And it comes up when teaching people about how the parser and compiler work. And... That's about it. You can (as the docs do) loosely use "literal" to include non-comprehension displays in some places but not others, or even to include -2 or 1+2j in some places but not others, and nobody gets confused, except in those special contexts where you're going to have to get into the details anyway. This is similar to the fact that Python doesn't actually define the semantics of numeric literals anywhere. It's still obvious to anyone what they're supposed to be. The Python docs are a language reference manual, not a rigorous specification, and that's fine.

...

But how? Without magic (like a registry or something similarly not locally visible in the source), how does the compiler know about user-defined literals at compile time? Python (unlike C++) doesn't have an extensible notion of "compile-time computation" to hook into here. And why do you actually care that it happens at compile time? If it's for optimization, that may be premature and irrelevant. (Certainly 1.2d isn't going to be any _worse_ than Decimal('1.2'), it just may not be better.) If it's because you want to reflect on code objects or something, that's not normal end-user code. Why should a normal user ever even know, much less care, whether 1.2d is stored as a constant or an expression in memory or in a .pyc file?

...

But pathlib.Path isn't immutable. Meanwhile, that reminds me: one of the frequent selling points for Swift's related feature is for NSURL literals (which Cocoa uses for local paths as well as remote resources); I should go through the Swift selling points to see if they've found other things that the C++ community hasn't (but that can be ported to the C++ design, and that don't depend on peculiarities of Cocoa to be interesting).

Chris Angelico

5:12 a.m.

On Wed, Jun 3, 2015 at 11:56 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

...

On Jun 2, 2015, at 18:05, Chris Angelico <rosuav@gmail.com> wrote:

...
My understanding of "literal" is something which can be processed entirely at compile time, and retained in the code object, just like strings are.

The problem is that Python doesn't really define what it means by "literal" anywhere, and the documentation is not consistent. There are at least two places (not counting tutorial and howtos) that Python 3.4 refers to list or dict literals. (That's not based on a search; someone wrote a StackOverflow question asking what those two places meant.)

Which I don't actually think is much of a problem. It means that in cases like this proposal, you have to be explicit about exactly what you mean by "literal" because Python doesn't do it for you. And it comes up when teaching people about how the parser and compiler work. And... That's about it. You can (as the docs do) loosely use "literal" to include non-comprehension displays in some places but not others, or even to include -2 or 1+2j in some places but not others, and nobody gets confused, except in those special contexts where you're going to have to get into the details anyway.

This is similar to the fact that Python doesn't actually define the semantics of numeric literals anywhere. It's still obvious to anyone what they're supposed to be. The Python docs are a language reference manual, not a rigorous specification, and that's fine.

Yes, it's a bit tricky. Part of the confusion comes from the peephole optimizer; "1+2j" looks like a constant, but it's actually a compile-time expression. It wouldn't be a big problem to have an uber-specific definition of "literal" that cuts out things like that; for the most part, it's not going to be a problem (eg if you define a fractions.Fraction literal, you could use "1/2frac" or "1frac/2" and you'd get back Fraction(1, 2) either way, simply because division of Fraction and int works correctly; you could even have a "mixed number literal" like "1+1/2frac" and it'd evaluate just fine).

...

...
Once the code's finished being compiled, there's no record of what type of string literal was used (raw, triple-quoted, etc), only the type of string object (bytes/unicode). Custom literals could be the same

But how? Without magic (like a registry or something similarly not locally visible in the source), how does the compiler know about user-defined literals at compile time? Python (unlike C++) doesn't have an extensible notion of "compile-time computation" to hook into here.

Well, an additional parameter to compile() would do it. I've no idea how hard it is to write an import hook, but my notion was that you could do it that way and alter the behaviour of the compilation process. But I haven't put a lot of thought into implementation, nor do I know enough of the internals to know what's plausible and what isn't.

...

And why do you actually care that it happens at compile time? If it's for optimization, that may be premature and irrelevant. (Certainly 1.2d isn't going to be any _worse_ than Decimal('1.2'), it just may not be better.) If it's because you want to reflect on code objects or something, that's not normal end-user code. Why should a normal user ever even know, much less care, whether 1.2d is stored as a constant or an expression in memory or in a .pyc file?

It's to do with expectations. A literal should simply be itself, nothing else. When you have a string literal in your code, nothing can change what string that represents; at compilation time, it turns into a string object, and there it remains. Shadowing the name 'str' won't affect it. But if something that looks like a literal ends up being a function call, it could get extremely confusing - name lookups happening at run-time when the name doesn't occur in the code. Imagine the traceback: def calc_profit(hex): decimal = int(hex, 16) return 0.2d * decimal

...

...
...
calc_profit("1E2A") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in calc_profit AttributeError: 'int' object has no attribute 'Decimal'

Uhh... what? Sure, I shadowed the module name there, but I'm not *using* the decimal module! I'm just using a decimal literal! It's no problem to shadow the built-in function 'hex' there, because I'm not using the built-in function! Whatever name you use, there's the possibility that it'll have been changed at run-time, and that will cause no end of confusion. A literal shouldn't cause surprise function calls and name lookups.

...

...
- come to think of it, it might be nice to have pathlib.Path literals, represented as p"/home/rosuav" or something. In any case, they'd be evaluated using only compile-time information, and would then be saved as constants.

That implies that only immutables should have literal syntaxes. I'm not sure whether that's significant or not.

But pathlib.Path isn't immutable.

Huh, it isn't? That's a pity. In that case, I guess you can't have a path literal. In any case, I'm sure there'll be other string-like things that people can come up with literal syntaxes for. ChrisA

Andrew Barnert

6:55 p.m.

On Jun 2, 2015, at 20:12, Chris Angelico <rosuav@gmail.com> wrote:

...

...
On Wed, Jun 3, 2015 at 11:56 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

...
On Jun 2, 2015, at 18:05, Chris Angelico <rosuav@gmail.com> wrote:

...
...
Once the code's finished being compiled, there's no record of what type of string literal was used (raw, triple-quoted, etc), only the type of string object (bytes/unicode). Custom literals could be the same

But how? Without magic (like a registry or something similarly not locally visible in the source), how does the compiler know about user-defined literals at compile time? Python (unlike C++) doesn't have an extensible notion of "compile-time computation" to hook into here.

Well, an additional parameter to compile() would do it.

I don't understand what you mean. Sure, you can pass the magic registry a separate argument instead of leaving it in the local/global environment, but that doesn't really change anything.

...

I've no idea how hard it is to write an import hook, but my notion was that you could do it that way and alter the behaviour of the compilation process.

It's not _that_ hard to write an import hook. But what are you going to do in that hook? If you're trying to change the syntax of Python by adding a new literal suffix, you have to rewrite the parser. (My hack gets around that by tokenizing, modifying the token stream, untokenizing, and compiling. But you don't want to do that in real life.) So I assume your idea means something like: first we parse 2.3d into something like a new UserLiteral AST node, then if no hook translates that into something else before the AST is compiled, it's a SyntaxError? But that still means: * If you want to use a user-defined literal, you can't import it; you need another module to first import that literal's import hook and then import your module. * Your .pyc file won't get updated when that other module changes the hooks in place when your module gets imported. * That's a significant amount of boilerplate for each module that wants to offer a new literal. * While it isn't actually that hard, it is something most module developers have no idea how to write. (A HOWTO could maybe help here....) * Every import has to be hooked and transformed once for each literal you want to be available. Meanwhile, what exactly could the hook _do_ at compile time? It could generate the expression `Decimal('1.2')`, but that's no more "literal" than `literal_d('1.2')`, and now it means your script has to import `Decimal` into its scope instead. I suppose your import hook could push that import into the top of the script, but that seems even more magical. Or maybe you could generate an actual Decimal object, pickle it, compile in the expression `pickle.loads(b'cdecimal\nDecimal\np0\n(V1.2\np1\tp2\nRp3\n.')`, and push in a pickle import, but that doesn't really solve anything. Really, trying to force something into a "compile-time computation" in a language that doesn't have a full compile-time sub-language is a losing proposition. C++03 had a sort of accidental minimal compile-time sub-language based on template expansion and required constant folding for integer and pointer arithmetic, and that really wasn't sufficient, which is why C++11 and D both added ways to use most of the language explicitly at compile time (and C++11 still didn't get it right, which is why C++14 had to redo it). In Python, it's perfectly fine that -2 and 1+2j and (1, 2) are all compiled into expressions, so why isn't it fine that 1.2d is compiled into an expression? And, once you accept that, what's wrong with the expression being `literal_d('1.2')` instead of `Decimal('1.2')`?

...

But I haven't put a lot of thought into implementation, nor do I know enough of the internals to know what's plausible and what isn't.

...
And why do you actually care that it happens at compile time? If it's for optimization, that may be premature and irrelevant. (Certainly 1.2d isn't going to be any _worse_ than Decimal('1.2'), it just may not be better.) If it's because you want to reflect on code objects or something, that's not normal end-user code. Why should a normal user ever even know, much less care, whether 1.2d is stored as a constant or an expression in memory or in a .pyc file?

It's to do with expectations. A literal should simply be itself, nothing else. When you have a string literal in your code, nothing can change what string that represents; at compilation time, it turns into a string object, and there it remains. Shadowing the name 'str' won't affect it. But if something that looks like a literal ends up being a function call, it could get extremely confusing - name lookups happening at run-time when the name doesn't occur in the code. Imagine the traceback:

def calc_profit(hex): decimal = int(hex, 16) return 0.2d * decimal

...
...
...
calc_profit("1E2A") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in calc_profit AttributeError: 'int' object has no attribute 'Decimal'

But that _can't_ happen with my design: the `0.2d` is compiled to `literal_d('0.2')`. The call to `decimal.Decimal` is in that function's scope, so nothing you do in your function can interfere with it. Sure, you can still redefine `literal_d`, but (a) why would you, and (b) even if you do, the problem will be a lot more obvious (especially since you had to explicitly `from decimalliterals import literal_d` at the top of the script, while you didn't have to even mention `decimal` or `Decimal` anywhere). But your design, or any design that does the translation at compile time, _would_ have this problem. If you compile `0.2d` directly into `decimal.Decimal('0.2')`, then it's `decimal` that has to be in scope. Also, notice that my design leaves the door open for later coming up with a special bytecode to look up translation functions following different rules (a registry, an explicit global lookup that ignores local shadowing, etc.); translating into a normal constructor expression doesn't.

...

Uhh... what? Sure, I shadowed the module name there, but I'm not *using* the decimal module! I'm just using a decimal literal! It's no problem to shadow the built-in function 'hex' there, because I'm not using the built-in function!

Whatever name you use, there's the possibility that it'll have been changed at run-time, and that will cause no end of confusion. A literal shouldn't cause surprise function calls and name lookups.

...
...
- come to think of it, it might be nice to have pathlib.Path literals, represented as p"/home/rosuav" or something. In any case, they'd be evaluated using only compile-time information, and would then be saved as constants.

That implies that only immutables should have literal syntaxes. I'm not sure whether that's significant or not.

But pathlib.Path isn't immutable.

Huh, it isn't? That's a pity. In that case, I guess you can't have a path literal.

I don't understand why you think this is important. Literal values, compile-time-computable/accessible values, and run-time-constant values are certainly not unrelated, but they're not the same thing. Other languages don't try to force them to be the same. In C++, for example, a literal has to evaluate into a compile-time-computable expression that only uses constant compile-time-accessible values, but the value it doesn't have to be constant at runtime. In fact, it's quite common for it not to be.

...

In any case, I'm sure there'll be other string-like things that people can come up with literal syntaxes for.

Chris Angelico

11:48 p.m.

On Thu, Jun 4, 2015 at 2:55 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

...

In Python, it's perfectly fine that -2 and 1+2j and (1, 2) are all compiled into expressions, so why isn't it fine that 1.2d is compiled into an expression? And, once you accept that, what's wrong with the expression being `literal_d('1.2')` instead of `Decimal('1.2')`?

That's exactly the thing: 1.2d should be atomic. It should not be an expression. The three examples you gave are syntactically expressions, but they act very much like literals thanks to constant folding:

...

which means they behave the way people expect them to. There is no way for run-time changes to affect what any of those expressions yields. Whether you're talking about shadowing the name Decimal or the name literal_d, the trouble is that it's happening at run-time. Here's another confusing case: import decimal from fractionliterals import literal_fr # oops, forgot to import literal_d # If we miss off literal_fr, we get an immediate error, because # 1/2fr gets evaluated at def time. def do_stuff(x, y, portion=1/2fr): try: result = decimal.Decimal(x*y*portion) except OverflowError: return 0.0d You won't know that your literal has failed until something actually triggers the error. That is extremely unobvious, especially since the token "literal_d" doesn't occur anywhere in do_stuff(). Literals look like atoms, and if they behave like expressions, sooner or later there'll be a ton of Stack Overflow questions saying "Why doesn't my code work? I just changed this up here, and now I get this weird error". Is that how literals should work? No. ChrisA

Andrew Barnert

1:03 a.m.

On Jun 3, 2015, at 14:48, Chris Angelico <rosuav@gmail.com> wrote:

...

...
On Thu, Jun 4, 2015 at 2:55 AM, Andrew Barnert <abarnert@yahoo.com> wrote: In Python, it's perfectly fine that -2 and 1+2j and (1, 2) are all compiled into expressions, so why isn't it fine that 1.2d is compiled into an expression? And, once you accept that, what's wrong with the expression being `literal_d('1.2')` instead of `Decimal('1.2')`?

That's exactly the thing: 1.2d should be atomic. It should not be an expression. The three examples you gave are syntactically expressions, but they act very much like literals thanks to constant folding:

...
...
...
dis.dis(lambda: -2) 1 0 LOAD_CONST 2 (-2) 3 RETURN_VALUE dis.dis(lambda: 1+2j) 1 0 LOAD_CONST 3 ((1+2j)) 3 RETURN_VALUE dis.dis(lambda: (1, 2)) 1 0 LOAD_CONST 3 ((1, 2)) 3 RETURN_VALUE

which means they behave the way people expect them to.

But that's not something that's guaranteed by Python. It's something that implementations are allowed to do, and that CPython happens to do. If user code actually relied on that optimization, that code would be nonportable. But the reason Python allows that optimization in the first place is that user code actually doesn't care whether these expressions are evaluated "atomically" or at compile time, so it's ok to do so behind users' backs. It's not surprising because no one is going to monkeypatch int.__neg__ between definition time and call time (which CPython doesn't, but some implementations do), or call dis and read the bytecode if they don't even understand what a compile-time optimization is, and so on.

...

There is no way for run-time changes to affect what any of those expressions yields. Whether you're talking about shadowing the name Decimal or the name literal_d, the trouble is that it's happening at run-time. Here's another confusing case:

import decimal from fractionliterals import literal_fr # oops, forgot to import literal_d

# If we miss off literal_fr, we get an immediate error, because # 1/2fr gets evaluated at def time. def do_stuff(x, y, portion=1/2fr): try: result = decimal.Decimal(x*y*portion) except OverflowError: return 0.0d

You won't know that your literal has failed until something actually triggers the error.

If that's a problem, then you're using the wrong language. You also won't know that you've typo'd OvreflowError or reslt, or called d.sqrt() instead of decimal.sqrt(d), or all kinds of other errors until something actually triggers the error. Which means either executing the code, or running a static linter. Which would be exactly the same for 1.2d.

...

That is extremely unobvious, especially since the token "literal_d" doesn't occur anywhere in do_stuff().

This really isn't going to be confusing in real life. You get an error saying you forgot to define literal_d. You say, "Nuh uh, I did define it right at the top, same way I did literal_fr, in this imp... Oops, looks like I forgot to import it".

...

Literals look like atoms, and if they behave like expressions, sooner or later there'll be a ton of Stack Overflow questions saying "Why doesn't my code work? I just changed this up here, and now I get this weird error".

Can you come up with an actual example where changing this up here gives this weird error somewhere else? If not, I doubt even the intrepid noobs of StackOverflow will come up with one. Neither of the examples so far qualifies--the first one is an error that the design can never produce, and the second one is not weird or confusing any more than any other error in any dynamic languages. And if you're going to suggest "what if I just redefine literal_d for no reason", ask yourself who would ever do that? Redefining decimal makes sense, because that's a reasonable name for a variable; redefining literal_d is as silly as redefining __name__. (But if you think those are different because double underscores are special, I suppose __literal_d__ doesn't bother me.)

Chris Angelico

1:40 a.m.

On Thu, Jun 4, 2015 at 9:03 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

...

Can you come up with an actual example where changing this up here gives this weird error somewhere else? If not, I doubt even the intrepid noobs of StackOverflow will come up with one.

Neither of the examples so far qualifies--the first one is an error that the design can never produce, and the second one is not weird or confusing any more than any other error in any dynamic languages.

Anything that causes a different code path to be executed can do this. ChrisA

Andrew Barnert

3:03 a.m.

On Jun 3, 2015, at 16:40, Chris Angelico <rosuav@gmail.com> wrote:

...

Well, any expression causes a different code path to be executed than any different expression, or what would be the point? But how is this relevant here? Is there an example where 1.2d would lead to "changing this up here gives this weird error somewhere else" that doesn't apply just as well to spam.eggs (or that's relevant or likely to come up or whatever in the case of 1.2d but not in the case of spam.eggs)? Otherwise, you're just presenting an argument against dynamic languages--or maybe even against programming languages full stop (after all, the same kinds of things can happen in Haskell or C++, they just often happen at compile time, so you get to debug the same "weird error" earlier).

Terry Reedy

6:48 a.m.

On 6/2/2015 9:56 PM, Andrew Barnert via Python-ideas wrote:

...

The problem is that Python doesn't really define what it means by "literal" anywhere,

The reference manual seems quite definite to me. The definitive section is "Section 2.4. Literals". I should have all the information needed to write a new implementation. It starts "Literals are notations for constant values of some built-in types." The relevant subsections are: 2.4.1. String and Bytes literals 2.4.2. String literal concatenation 2.4.3. Numeric literals 2.4.4. Integer literals 2.4.5. Floating point literals 2.4.6. Imaginary literals

...

and the documentation is not consistent.

I'd call it a bit sloppy in places.

...

There are at least two places (not counting tutorial and howtos) that Python 3.4 refers to list or dict literals. (That's not based on a search; someone wrote a StackOverflow question asking what those two places meant.)

Please open a tracker issue to correct the sloppiness and reference the SO issue as evidence that it confuses people.

...

Which I don't actually think is much of a problem. It means that in cases like this proposal, you have to be explicit about exactly what you mean by "literal" because Python doesn't do it for you.

Again, the Language Reference seems sufficiently explicit and detailed to write another implementation. 2.4.3 says "There are three types of numeric literals: integers, floating point numbers, and imaginary numbers. There are no complex literals (complex numbers can be formed by adding a real number and an imaginary number). Note that numeric literals do not include a sign; a phrase like -1 is actually an expression composed of the unary operator ‘-‘ and the literal 1." I will let you read the three specific subsections

...

This is similar to the fact that Python doesn't actually define the semantics of numeric literals anywhere.

I am again puzzled by your claim. There are 3 builtin number classes: int, float, and complex. There are 3 type of numeric literals: integer, float, and imaginary. "An imaginary literal yields a complex number with a real part of 0.0." Anyone capable of programming Python should be able to match 'integer' with 'int' and 'float' with 'float. -- Terry Jan Reedy

Andrew Barnert

8:26 p.m.

I think this is off-topic, but it's important enough to answer anyway. On Jun 2, 2015, at 21:48, Terry Reedy <tjreedy@udel.edu> wrote:

...

...
On 6/2/2015 9:56 PM, Andrew Barnert via Python-ideas wrote:

The problem is that Python doesn't really define what it means by "literal" anywhere,

The reference manual seems quite definite to me. The definitive section is "Section 2.4. Literals". I should have all the information needed to write a new implementation.

No, that defines what literals mean for the purpose of lexical analysis.

...

It starts "Literals are notations for constant values of some built-in types."

By the rules in this section, ..., None, True, and False are not literals, even though they are called literals everywhere else they appear in the documentation except for the Lexical Analysis chapter. In fact, even within that chapter, in 2.6 Delimiters, it explains that "A sequence of three periods has a special meaning as an ellipsis literal." By the rules in this section, "-2" is not a literal, even though, e.g., in the data model section it says "co_consts is a tuple containing the literals used by the bytecode", and in every extant Python implementation -2 will be stored in co_consts. By the rules in this section, "()" and "{}" are not literals, even though, e.g., in the set displays section it says "An empty set cannot be constructed with {}; this literal constructs an empty dictionary." And so on. And that's fine. None of those things are literals for the purpose of lexical analysis, even though they are things that represent literal values. And using the word "literal" somewhat loosely isn't confusing anywhere. Where a more specific definition is needed, as when documenting the lexical analysis phase of the language, a specific definition is given. And this is what allows ast.literal_eval to refer to "the following Python literal structures: strings, bytes, numbers, tuples, dicts, sets, booleans, and None" instead of having to say "the following Python literal structures: strings, bytes, and numbers; the negation of a literal number; the addition or subtraction of a non-imaginary literal number and an imaginary literal number; expression lists containing at least one comma; empty parentheses; the following container displays when not containing comprehensions: lists, dicts, sets; the keywords True, False, and None". I don't think that's a bad thing. If you want to know what the "literal structure... None" means, it's easy to find out, and the fact that None is tokenized as a keyword rather than as a literal does not hamper you in any way. If you actually need to write a tokenizer, then the fact that None is tokenized as a keyword makes a difference--and you can find that out easily as well.

...

...
and the documentation is not consistent.

I'd call it a bit sloppy in places.

I wouldn't call it sloppy. I'd call it somewhat loose and informal in places, but that's often a good thing.

...

...
There are at least two places (not counting tutorial and howtos) that Python 3.4 refers to list or dict literals. (That's not based on a search; someone wrote a StackOverflow question asking what those two places meant.)

Please open a tracker issue to correct the sloppiness and reference the SO issue as evidence that it confuses people.

But it doesn't confuse people in any relevant way. The user who asked that question had no problem figuring out how to interpret code that includes a (), or even how that code should be and is compiled. He could have written a Python interpreter with the knowledge he had. Maybe he couldn't have written a specification, but who cares? He doesn't need to.

...

...
This is similar to the fact that Python doesn't actually define the semantics of numeric literals anywhere.

I am again puzzled by your claim. There are 3 builtin number classes: int, float, and complex. There are 3 type of numeric literals: integer, float, and imaginary. "An imaginary literal yields a complex number with a real part of 0.0." Anyone capable of programming Python should be able to match 'integer' with 'int' and 'float' with 'float.

Yes, and they should also be able to tell that the integer literal "42" should evaluate to an int whose value is equal to 42, and that "the value may be approximated in the case of floating point" means that the literal "1.2" should evaluate to the float whose value is closest to 1.2 rather than some different approximation, and so on. But the documentation doesn't actually define any of that. It doesn't have to, because it assumes it's being read by a non-idiot who's capable of programming Python (and won't deliberately make stupid decisions in interpreting it just because he's technically allowed to). The C++ specification defines all of that, and more (that the digits are interpreted with the leftmost as most significant, that the runtime value of an integer literal is not an lvalue, that it counts as a compile-time constant value, and so on). It attempts to make no assumptions at all (and there have been cases where C++ compiler vendors _have_ made deliberately obtuse interpretations just to make a point about the standard). That's exactly why reference documentation is more useful than a specification: because it leaves out the things that should be obvious to anyone capable of programming Python. To learn how integer literals work in Python, I need to look at two short and accessible paragraphs; to learn how integer literals work in C++, I have to read 2 full-page sections plus parts of at least 2 others, all written in impenetrable legalese.

Nathaniel Smith

9:40 p.m.

On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:

...

This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++.

In the thread on decimals, a number of people suggested that they'd like to have decimal literals. Nick Coghlan explained why decimal.Decimal literals don't make sense in general (primarily, but not solely, because they're inherently context-sensitive), so unless we first add a fixed type like decimal64, that idea is a non-starter. However, there was some interest in either having Swift-style convertible literals or C++-style user-defined literals. Either one would allow users who want decimal literals for a particular app where it makes sense (because there's a single fixed context, and the performance cost of Decimal('1.2') vs. a real constant is irrelevant) to add them without too much hassle or hackery.

Are there any use cases besides decimals? Wouldn't it be easier to just add, say, a fixed "0d" prefix for decimals? 0x1001 # hex 0b1001 # binary 0d1.001 # decimal

...

Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.

Also there's the idea floating around of making *all* dicts ordered (as PyPy has done), which would be much cleaner if it can be managed, so I'm guessing that would have to be tried and fail before any new syntax would be added for this use case. -n -- Nathaniel J. Smith -- http://vorpus.org

Chris Kaynor

10:30 p.m.

On Tue, Jun 2, 2015 at 12:40 PM, Nathaniel Smith <njs@pobox.com> wrote:

...

In terms of other included useful options, you also have fractions. There could also be benefit of using such a system for cases of numbers with units, such as having the language understand 23.49MB. That said, very similar results could be achieved in most cases by merely using a normal function, without the need for special syntax. Decimal and Fraction are probably the only two major cases where you will see any actual benefit, though there may be libraries that may provide other number formats that could benefit (perhaps a base-3 number?).

...

One benefit of the proposal is that it can be readily generalized to all literal syntax, so custom behaviors for native support of ordered dicts, trees, ordered sets, multi-sets, counters, and so forth could all be added via libraries, with little to no additional need for Python to be updated to support them directly. All-in-all, I'd be very mixed on such a feature. I can see plenty of cases where it would provide benefit, however it also adds quite a bit of complexity to the language, and could easily result in code with nasty action-at-a-distance issues. If such a feature were implemented, Python would probably also want to reserve some set of the names for future language features, similar to how dunder names are reserved.

Nathaniel Smith

11:26 p.m.

On Jun 2, 2015 1:32 PM, "Chris Kaynor" <ckaynor@zindagigames.com> wrote:

...

with units, such as having the language understand 23.49MB. The unit libraries I've seen just spell this as "23.49 * MB" (or "22.49 * km / h" for a speed, say). And crucially they don't have any need to override the parsing rules for float literals. -n

Andrew Barnert

3:35 a.m.

On Jun 2, 2015, at 12:40, Nathaniel Smith <njs@pobox.com> wrote:

...

On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:

...
This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++.

In the thread on decimals, a number of people suggested that they'd like to have decimal literals. Nick Coghlan explained why decimal.Decimal literals don't make sense in general (primarily, but not solely, because they're inherently context-sensitive), so unless we first add a fixed type like decimal64, that idea is a non-starter. However, there was some interest in either having Swift-style convertible literals or C++-style user-defined literals. Either one would allow users who want decimal literals for a particular app where it makes sense (because there's a single fixed context, and the performance cost of Decimal('1.2') vs. a real constant is irrelevant) to add them without too much hassle or hackery.

Are there any use cases besides decimals?

...

Wouldn't it be easier to just add, say, a fixed "0d" prefix for decimals?

I suggested that on the other thread, but go back and read the first paragraph of this thread. We don't want a standard literal syntax for decimal.Decimal. Some users may want it for some projects, but they should have to do something explicit to get it. Meanwhile, a literal syntax for decimal64 would be very useful, but there's no such type in the stdlib, so anyone who wants it has to go get it on PyPI, which means the PyPI module, not Python itself, would have to supply the literal. And, since I don't know of any implementation of decimal64 without decimal32 and decimal128, I can easily imagine wanting separate literals for all three. And f or r for fraction came up in the other thread. Beyond that? I don't know. If you look at the C++ proposal (N2750) and the various blog posts written around 2008-2012, here's what comes up repeatedly, in (to me) decreasing order of usefulness in Python: * One or more decimal float types. * Custom string types, like a string that iterates graphemes clusters instead of code units (Java and Swift have these; I don't know of an implementation for Python), or a mutable rope-based implementation, or the bytes-that-knows-its-encoding type that Nick Coghlan suggested some time last year. * Integers specified in arbitrary bases. * Quaternions or other number-like types beyond complex. * Points or vectors represented as 3x + 4z. * Units. Which I'm not sure is a good idea. (200*km seems just as readable to me as 200km, and only the former extends in an obvious way to 200*km/sec...) And I think the same goes for similar things like CSS units (1.2*em seems as good as 1.2_em to me). * Various things Python already has (real string objects instead of char*, real Unicode strings, binary integers, arbitrary-precision integers, etc.). * Cases where a constructor call would actually be just as nice, except for some other deficiency of C++ (e.g., you can't use a constexpr constructor expression as a template argument in C++11). * Blatantly silly things, like integral radians or integral halfs (which people keep saying physicists could use, only for physicists to ask "where would I use that?").

...

0x1001 # hex 0b1001 # binary 0d1.001 # decimal

...
Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.

Also there's the idea floating around of making *all* dicts ordered (as PyPy has done), which would be much cleaner if it can be managed, so I'm guessing that would have to be tried and fail before any new syntax would be added for this use case.

Well, OrderedDict isn't the only container class, even in the stdlib. But the real point would be types outside the stdlib. You could construct a sorted dict using blist or SortedContainers without having to first construct a dict in arbitrary order and then copy-sort it. Or build a NumPy array without building a list. And so on. But, again, I think the additional problems with container literals (which, again, aren't really literals) mean it would be worth leaving this out of any 1.0 proposal (and if containers are the only good selling point for the whole thing, that may mean the whole thing isn't worth having).

Bruce Leban

3:50 a.m.

On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:

...

You seem to suggest that the token should start with an underscore when you write 1.2_dec and {...}_o but not when you write 1.2d and 1.2jump. Requiring the underscore solves the ambiguity and would make literals more readable. I would also require an alphabetic character after the _ and prohibit _ inside the name to avoid confusion. 1.2_d => literal_d('1.2') 1.2j_ump => literal_ump('1.2j') 1.2_jump => literal_jump('1.2') 0x12dec_imal => literal_imal('0x12dec') 0x12_decimal => literal_decimal('0x12') "1.2"_ebcdic => literal_ebcdic('1.2') 1.2d => error 0x12decimal => error 1_a_b => error 1_2 => error I do think the namescape thing is an issue but requiring me to write from literals import literal_jump isn't necessarily that bad. Without an explicit import, how would I go about tracking down what exactly 21_jump means? The use of _o on a dict is strange since the thing you're attaching it to isn't a literal. I think there needs to be some more thought here if you want to apply it to anything other than a simple value: (1, 3, 4)_xyzspace {'a': 1 + 2}_o {'a', 'b': 3}_o ("abc")_x ("abc", "def")_x "abc" "def"_x ("abc" "def")_x ("abc" "def",)_x --- Bruce Check out my new puzzle book: http://J.mp/ingToConclusions Get it free here: http://J.mp/ingToConclusionsFree (available on iOS)

Andrew Barnert

5:57 a.m.

On Jun 2, 2015, at 18:50, Bruce Leban <bruce@leban.us> wrote:

...

...
On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote: Any number or string token followed by a name (identifier) token is currently illegal. This would change so that, if there's no whitespace between them, it's legal, and equivalent to a call to a function named `literal_{name}({number-or-string})`. For example, `1.2d` becomes `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also becomes `literal_d('1.2')`.

Of course `0x12decimal` becomes `literal_imal('0x12dec')`, and `21jump` becomes `literal_ump('21j'), which are not at all useful, and potentially confusing, but I don't think that would be a serious problem in practice.

You seem to suggest that the token should start with an underscore when you write 1.2_dec and {...}_o but not when you write 1.2d and 1.2jump.

Well, I was suggesting leaving it up to the user who defines the literals. Sure, it's possible to come up with confusing suffixes, but if we can trust end users to name a variable that holds an XML tree "root" instead "_12rawbytes", can't we trust library authors to name their suffixes appropriately? I think you will _often_ want the preceding underscore, at least for multi-character suffixes, and you will _almost never_ want multiple underscores, or strings of underscores and digits without letters, etc. But that seems more like something for PEP8 and other style guides and checkers than something the language would need to enforce. However, I noticed that I left off the extra underscore in literal__dec, and it really does look pretty ugly that way, so... Maybe you have a point here.

...

I do think the namescape thing is an issue but requiring me to write

from literals import literal_jump

isn't necessarily that bad. Without an explicit import, how would I go about tracking down what exactly 21_jump means?

Thanks; that's the argument I was trying to make and not making very well.

...

The use of _o on a dict is strange since the thing you're attaching it to isn't a literal. I think there needs to be some more thought here if you want to apply it to anything other than a simple value:

At least two people suggested that it's better to just explicitly put that whole question of collection "literals" off for the future (assuming the basic idea of numeric and string literal suffixes is worth considering at all), and I think they're right.

Steven D'Aprano

4:52 a.m.

On Tue, Jun 02, 2015 at 12:03:25PM -0700, Andrew Barnert via Python-ideas wrote:

...

I'm torn. On the one hand, some sort of extensible syntax for literals would be nice. I say "nice" rather than useful because there are advantages and disadvantages and there's no way of really knowing which outweighs the other. But, really, your proposal is in no way, shape or form syntax for *literals*, it's a new syntax for an unary postfix operator or function. The whole point of something being a literal is that it is parsed and converted at compile time. Now you might (and do) say that worrying about this is "premature optimization", but call me a pedant if you like, I don't think we should call something a literal if it's a runtime function call. Otherwise, we might as well say that from fractions import Fraction Fraction(2) is a literal, in which case I can say your proposal is unnecessary as we already have user-specified literals in Python. I can think of some interesting uses for postfix operators, or literals, or whatever we want to call them: 45° 10!! 23.5d 3d6 35'24" 15ell I've deliberately not explained what I mean by each of them. You can probably guess some, or all, but I hope it demonstrates one problem with this suggestion. Like operator overloading, it risks making code less clear rather than more. -- Steve

Andrew Barnert

9:43 p.m.

On Jun 2, 2015, at 19:52, Steven D'Aprano <steve@pearwood.info> wrote:

...

...
On Tue, Jun 02, 2015 at 12:03:25PM -0700, Andrew Barnert via Python-ideas wrote:

I explored the convertible literals a while ago, and I'm pretty sure that doesn't work in a duck-typed language. But the C++ design does work, as long as you're willing to have the conversion (including the lookup of the conversion function itself) done at runtime.

I'm torn. On the one hand, some sort of extensible syntax for literals would be nice. I say "nice" rather than useful because there are advantages and disadvantages and there's no way of really knowing which outweighs the other.

That's exactly why I came up with something I could hack up without any changes to the interpreter. It means anyone can try it out and see whether the advantages outweigh the disadvantages for them. (Of course there are additional disadvantages to the hack in efficiency, hackiness, and possibly debugability, so it may unfairly bias people who don't keep that in mind--but if so, it can only bias them in the conservative direction of rejecting the idea, which I think is ok.)

...

But, really, your proposal is in no way, shape or form syntax for *literals*,

It's a syntax for things that are somewhat like `2`, more like `-2`, even more like `(2,)`, but still not exactly the same as even that. If you don't like using the word "literal" for that, you can come up with a different word. I called it a "literal" because "user-defined literals" is what people were asking for when they asked for `2.3d`, and it has clear parallels with a very similar feature with the same name in other languages. But I'm fine calling it something different, as long as people who are looking for it will know how to find it.

...

it's a new syntax for an unary postfix operator

That's fair; C++ in fact defines its user literal syntax in terms of special constexpr operator overloads, and points out the similarities to postfix operator++ in a note.

...

or function. The whole point of something being a literal is that it is parsed and converted at compile time. Now you might (and do) say that worrying about this is "premature optimization", but call me a pedant if you like, I don't think we should call something a literal if it's a runtime function call.

I don't think this is the right distinction. A literal is a notation for expressing some value that means what it says in a sufficiently simple way. That concept has significant overlap with "compile-time evaluable", and with "constant", but they're not the same concepts. And this is especially true for a language that doesn't define any compile-time computation phase. In Python, `-2` may be compiled to UNARY_NEGATIVE on the compiled-in constant value 2, or just to the compiled-in constant value -2, depending on what the implementation wants to optimize. Do you want to call it a literal in some implementations but not others? No reasonable user code that isn't reflecting on the internals is going to care, or even know, what the implementation is doing. Being "user-defined" means that the "sufficiently simple way" the notation gets its meaning has to involve user code. In a language with a compile-time computation phase like C++, that can mean "constexpr" user code, but Python doesn't define a "constexpr"-like phase. At any rate, again, if you want to call it something different, that's fine, as long as people looking for "what does `1.2d` mean in this program" or "how do I do the Python equivalent of a C++ user-defined literal" will be able to understand it.

...

Otherwise, we might as well say that

from fractions import Fraction Fraction(2)

is a literal, in which case I can say your proposal is unnecessary as we already have user-specified literals in Python.

In C++, a constructor expression like Fraction(2) may be evaluable at compile time, and may evaluate to something that's constant at both compile time and runtime, and yet it's still not a literal. Why? Because their rule for what counts as "sufficiently simple" includes constexpr postfix user-literal operators, but not constexpr function or constructor calls. I don't know of anyone who's confused by that. It's a useful (and intuitively useful) distinction, separate from the "constexpr" and "const" distinctions.

...

I can think of some interesting uses for postfix operators, or literals, or whatever we want to call them:

45° 10!! 23.5d 3d6 35'24" 15ell

I've deliberately not explained what I mean by each of them. You can probably guess some, or all, but I hope it demonstrates one problem with this suggestion. Like operator overloading, it risks making code less clear rather than more.

Sure. In fact, it's very closely analogous--both of them are ways to allow a user-defined type to act more like a builtin type, which can be abused to do completely different things instead. The C++ proposal specifically pointed out this comparison. I think the risk is lower in Python than in C++ just because Python idiomatically discourages magical or idiosyncratic programming much more strongly in general, and that means operator overloading is already used more consistently and less confusingly than in C++, so the same is more likely to be true with this new feature. But of course the risk isn't zero. Again, I'm hoping people will play around with it, come up with example code they can show to other people for impressions, etc., rather than trying to guess, or come up with some abstract argument. It's certainly possible that everything that looks like a good example when you think of it will look too magical to anyone who reads your code. Then the idea can be rejected, and if anyone thinks of a similar idea in the future, they can be pointed to the existing examples and asked, "Can your idea solve these problems?"

Steven D'Aprano

2:08 p.m.

On Wed, Jun 03, 2015 at 12:43:00PM -0700, Andrew Barnert wrote:

...

On Jun 2, 2015, at 19:52, Steven D'Aprano <steve@pearwood.info> wrote: [...]

...
But, really, your proposal is in no way, shape or form syntax for *literals*,

It's a syntax for things that are somewhat like `2`, more like `-2`, even more like `(2,)`, but still not exactly the same as even that.

Not really. It's a syntax for something that is not very close to *any* of those examples. Unlike all of those example, it is a syntax for calling a function at runtime. Let's take (-2, 1+3j) as an example. As you point out in another post, Python may constant-fold it, but isn't required to. Python 3.3 compiles it to a single constant: LOAD_CONST 6 ((-2, (1+3j))) but Python 1.5 compiles it to a series of byte-code operations: LOAD_CONST 0 (2) UNARY_NEGATIVE LOAD_CONST 1 (1) LOAD_CONST 2 (3j) BINARY_ADD BUILD_TUPLE 2 But that's just implementation detail. Whether Python 3.3 or 1.5, both expressions have something in common: the *operation* is immutable (I don't mean the object itself); there is nothing you can do, from pure python code, to make the literal (-2, 1+3j) something other than a two-tuple consisting of -2 and 1+3j. You can shadow int, complex and tuple, and it won't make a lick of difference. For lack of a better term, I'm going to call this a "static operation" (as opposed to dynamic operations like called len(x), which can be shadowed or monkey-patched). I don't wish to debate the definition of "literal", as that may be very difficult. For example, is 2+3j actually a literal, or an expression containing only literals? If a literal, how about 2*3**4/5 for that matter? As soon as Python compilers start doing compile-time constant folding, the boundary between literals and constant expressions becomes fuzzy. But that boundary is actually not very interesting. What is interesting is that every literal shares at least the property that I refer to above, that you cannot redefine the result of that literal at runtime by shadowing or monkey-patching. Coming from that perspective, a literal *defined* at runtime as you suggest is a contradiction in terms. I don't care so much if the actual operation that evaluates the literal happens at runtime, so long as it is static in the above sense. If it's dynamic, then it's not a literal, it's just a function call with ugly syntax.

...

If you don't like using the word "literal" for that, you can come up with a different word. I called it a "literal" because "user-defined literals" is what people were asking for when they asked for `2.3d`,

If you asked for a turkey and cheese sandwich on rye bread, and I said "Well, I haven't got any turkey, or rye, but I can give you a slice of cheese on white bread and we'll just call it a turkey and cheese rye sandwich", you probably wouldn't be impressed :-)

...

A literal is a notation for expressing some value that means what it says in a sufficiently simple way.

I don't think that works. "Sufficiently simple" is a problematic concept. If "123_d" is sufficiently simply, surely "d(123)" is equally simple? It's only one character more, and it's a much more familiar and conventional syntax. Especially since *_d ends up calling a function, which might as well be called d(). And if it is called d, why not a more_meaningful_name() instead? I would hope that the length of the function name is not the defining characteristic of "sufficiently simple"? (Consider 123_somereallylongbutmeaningfulnamehere.) I don't wish to argue about other languages, but I think for Python, the important characteristic of "literals" is that they are static, as above, not "simple". An expression with nested containers isn't necessarily simple: {0: [1, 2, {3, 4, (5, 6)}]} # add arbitrary levels of complexity nor is it necessarily constructed as a compile-time constant, but it is static in the above sense. [...]

...

...
Otherwise, we might as well say that

from fractions import Fraction Fraction(2)

is a literal, in which case I can say your proposal is unnecessary as we already have user-specified literals in Python.

In C++, a constructor expression like Fraction(2) may be evaluable at compile time, and may evaluate to something that's constant at both compile time and runtime, and yet it's still not a literal. Why? Because their rule for what counts as "sufficiently simple" includes constexpr postfix user-literal operators, but not constexpr function or constructor calls.

What is the logic for that rule? If it is just an arbitrary decision that "literals cannot include parentheses" then I equally arbitrarily dismiss that rule and say "of course they can, the C++ standard not withstanding, and the fact that Fraction(2) is a constant evaluated at compile time is proof of that fact". In any case, this is Python, and arguing over definitions from C++ is not productive. Our understanding of what makes a literal can be informed by other languages, but cannot be defined by other languages -- if for no other reason that other languages may not all agree on what is and isn't a literal. -- Steve

Paul Moore

3:06 p.m.

On 4 June 2015 at 13:08, Steven D'Aprano <steve@pearwood.info> wrote:

...

I think that the main reason that people keep asking for things like 1.2d in place of D('1.2') is basically that the use of a string literal, for some reason "feels different". It's not a technical issue, nor is it one of compile time constants or static values - it's simply about not wanting to *think* of the process as passing a string literal to a function. They want "a syntax for a decimal" rather than "a means of getting a decimal from a string" because that's how they think of what they are doing. People aren't asking for decimal literals because they don't know that they can do D('1.2'). They want to avoid the quotes because they don't "feel right", that's all. That's why the common question is "why doesn't D(1.2) do what I expect?" rather than "how do I include a decimal constant in my program?" "Literal" syntax is about taking a chunk of the source code as a string, and converting it into a runtime object. For built in types the syntax is known to the lexer and the compiler knows how to create the runtime constants (that applies as much to Python as to C or any other language). The fundamental question here is whether there is a Pythonic way of extending that to user-defined forms. That would have to be handled at runtime, so the *syntax* would need to be immutable, but the *semantics* could be defined in terms of runtime, without violating the spirit of the request. Such a syntax could be used for lots of things - regular expressions are a common type that gets dedicated syntax (Javascript, Perl). As a straw man how about a new syntax (this won't work as written, because it'll clash with the "<" operator, but the basic idea works): LITERAL_CALL = PRIMARY "<" <any source character except right angle bracket>* ">" which is a new option for PRIMARY alongside CALL. This translates directly into PRIMARY(str) where str is a string composed of the source characters within <...>. Decimal "literals" would then be from decimal import Decimal as D x = D<1.2> Code objects could be compile<x+1>. Regular expressions could be from re import compile as RE regex = RE<a.*([bc]+)$> As you can see the potential for line noise and unreadable code is there, but regular expressions always have that problem :-) Also, this proposal gives a "literal syntax" that works with existing features, rather than being a specialised add-on. Maybe that's a benefit (or maybe it's over-generalisation). Paul

Nick Coghlan

3:48 p.m.

On 4 June 2015 at 23:06, Paul Moore <p.f.moore@gmail.com> wrote:

...

The main idea I've had for compile time metaprogramming that I figured I might be able to persuade Guido not to hate is: python_ast, names2cells, unbound_names = !(this_is_an_arbitrary_python_expression) As suggested by the assignment target names, the default behaviour would be to compile the expression to a Python AST, and then at runtime provide some relevant information about the name bindings referenced from it. (I haven't even attempted to implement this, although I've suggested it to some of the SciPy folks as an idea they might want to explore to make R style lazy evaluation easier) By using the prefix+delimiters notation, it would become possible to later have variants that were similarly transparent to the compiler, but *called* a suitably registered callable at compile time to do the conversion to runtime Python objects. For example: !sh(shell command) !format(format string with implicit interpolation) !sql(SQL query) So for custom numeric types, you could register: d = !decimal(1.2) r = !rational(22/7) This isn't an idea I'm likely to have time to pursue myself any time soon (if ever), but I think it addresses the key concern with syntax customisation by ensuring that customisations have *names*, and that they're clearly distinguished from normal code. Cheers, Nick.

Paul Moore

4:25 p.m.

On 4 June 2015 at 14:48, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

The fundamental difference between this proposal and mine is (I think) that you're assuming an arbitrary Python expression in there (which is parsed), whereas I'm proposing an *unparsed* string. For example, your suggestion of !decimal(1.2) would presumably pass to the "decimal" function, an AST consisting of a literal float node for 1.2. Which has the same issues as anything else that parses 1.2 before the decimal constructor gets its hands on it - you've already lost the original that the people wanting decimal literals need access to. And I don't think your shell script example works - something like !sh(echo $PATH) would be a syntax error, surely? My proposal is specifically about allowing access to the *unevaluated* source string, to allow the runtime function to take control of the parsing. We have various functions already that take string representations and parse them to objects (Decimal, re.compile, compile...) - all I'm suggesting is a lighter-weight syntax than ("...") for "call with a string value". It's very hard to justify this, as it doesn't add any new functionality, and it doesn't add that much brevity. But it seems to me that it *does* add a strong measure of "doing what people expect" - something that's hard to quantify, but once you go looking for examples, it's applicable to a *lot* of longstanding requests. The more I look, the more uses I can think of (e.g., Windows paths via pathlib - Path<C:\Windows>). The main issue I see with my proposal (other than "Guido could well hate it" :-)) is that it has no answer to the fact that you can't include the closing delimiter in the string - as soon as you try to work around that, the syntax starts to lose its elegant simplicity *very* fast. (Raw strings have similar problems - the rules on backslashes in raw strings are clumsy at best). Like you, though, I don't have time to work on this, so it's just an idea if anyone else wants to pick up on it. Paul

Nick Coghlan

12:31 a.m.

On 5 Jun 2015 00:25, "Paul Moore" <p.f.moore@gmail.com> wrote:

...

No, when you supplied a custom parser, the parser would have access to the raw string (as well as the name -> cell mapping for the current scope). The "quoted AST parser" would just be the default one. Cheers, Nick.

Paul Moore

2:09 p.m.

On 4 June 2015 at 23:31, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Ah, I see now what you meant. Apologies, I'd not fully understood what you were proposing. In which case yes, your proposal is strictly more powerful than mine. You still have the same problem as me, that what's inside !xxx(...) cannot contain a ")" character. (Or maybe can't contain an unmatched ")", or an unescaped ")", depending on what restrictions you feel like putting on the form of the unparsed expression...) But I think that's fundamental to any form of syntax embedding, so it's not exactly a showstopper. Paul

Petr Viktorin

2:30 p.m.

On Fri, Jun 5, 2015 at 2:09 PM, Paul Moore <p.f.moore@gmail.com> wrote:

...

Parsing consumes tokens. The tokenizer already tracks parentheses (for ignoring indentation between them), so umatched parens would throw off the tokenizer itself. It'd be reasonable to require !macros to only contain valid Python tokens, and have matched parentheses tokens (i.e. ignoring parens in comments/string literals.)

Andrew Barnert

1:03 a.m.

On Jun 4, 2015, at 06:48, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

But what would that get you? If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled? Also, what's the point of it being compile-time? Unless there's some way to write arbitrary code that operates at compile time (like Lisp special forms, or C++ constexpr functions), what code is going to care about the difference between a compile-time decimal value and a run-time decimal value? Also, where and how do you define sh, decimal, sql, etc.? I'm having a hard time seeing how you have any different options than my proposal does. You could have a function named bang_decimal that's looked up normally, or some way to register_bang_function('decimal', my_decimal_parser), or any of the other options mentioned in this thread, but what's the difference (other than there being a default "no-name" function that does an AST parse and name binding, which doesn't really seem related to any of the non-default examples)?

...

Nick Coghlan

9:06 a.m.

On 5 June 2015 at 09:03, Andrew Barnert <abarnert@yahoo.com> wrote:

...

On Jun 4, 2015, at 06:48, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
...
On 4 June 2015 at 23:06, Paul Moore <p.f.moore@gmail.com> wrote: As a straw man how about a new syntax (this won't work as written, because it'll clash with the "<" operator, but the basic idea works):

LITERAL_CALL = PRIMARY "<" <any source character except right angle bracket>* ">"

The main idea I've had for compile time metaprogramming that I figured I might be able to persuade Guido not to hate is:

python_ast, names2cells, unbound_names = !(this_is_an_arbitrary_python_expression)

As suggested by the assignment target names, the default behaviour would be to compile the expression to a Python AST, and then at runtime provide some relevant information about the name bindings referenced from it. (I haven't even attempted to implement this, although I've suggested it to some of the SciPy folks as an idea they might want to explore to make R style lazy evaluation easier)

By using the prefix+delimiters notation, it would become possible to later have variants that were similarly transparent to the compiler, but *called* a suitably registered callable at compile time to do the conversion to runtime Python objects. For example:

!sh(shell command) !format(format string with implicit interpolation) !sql(SQL query)

So for custom numeric types, you could register:

d = !decimal(1.2) r = !rational(22/7)

But what would that get you?

If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled?

Also, what's the point of it being compile-time? Unless there's some way to write arbitrary code that operates at compile time (like Lisp special forms, or C++ constexpr functions), what code is going to care about the difference between a compile-time decimal value and a run-time decimal value?

Also, where and how do you define sh, decimal, sql, etc.? I'm having a hard time seeing how you have any different options than my proposal does. You could have a function named bang_decimal that's looked up normally, or some way to register_bang_function('decimal', my_decimal_parser), or any of the other options mentioned in this thread, but what's the difference (other than there being a default "no-name" function that does an AST parse and name binding, which doesn't really seem related to any of the non-default examples)?

The larger idea (again, keeping in mind I haven't actually fully thought through how to implement this) is to give the parsers access to the surrounding namespace, which means that the compiler needs to be made aware of any *actual* name references, and the *way* names are referenced would be parser dependent (shell variables, format string interpolation, SQL interpolation, etc). So, for example: print(!format(The {item} cost {amount} {units})) Would roughly translate to: print("The {item} cost {amount} {units}".format(item=item, amount=amount, units=units)) It seemed relevant in this context, as a compile time AST transformation would let folks define their own pseudo-literals. Since marshal wouldn't know how to handle them, the AST produced at compile time would still need to be for a runtime constructor call rather than for a value to be stored in co_consts. These cases: d = !decimal(1.2) r = !rational(22/7) Might simply translate directly to the following as the runtime code: d = decimal.Decimal("1.2") r = fractions.Fraction(22, 7) With the difference being that the validity of the passed in string would be checked at compile time rather than at runtime, so you could only use it for literal values, not to construct values from variables. As far as registration goes, yes, there'd need to be a way to hook the compiler to notify it of the existence of these compile time AST generation functions. Dave Malcolm's patch to allow parts of the compiler to be written in Python rather than C (https://bugs.python.org/issue10399 ) might be an interest place to start on that front. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Andrew Barnert

10:47 a.m.

On Jun 5, 2015, at 00:06, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

...
On 5 June 2015 at 09:03, Andrew Barnert <abarnert@yahoo.com> wrote:

...
On Jun 4, 2015, at 06:48, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
On 4 June 2015 at 23:06, Paul Moore <p.f.moore@gmail.com> wrote: As a straw man how about a new syntax (this won't work as written, because it'll clash with the "<" operator, but the basic idea works):

LITERAL_CALL = PRIMARY "<" <any source character except right angle bracket>* ">"

The main idea I've had for compile time metaprogramming that I figured I might be able to persuade Guido not to hate is:

python_ast, names2cells, unbound_names = !(this_is_an_arbitrary_python_expression)

As suggested by the assignment target names, the default behaviour would be to compile the expression to a Python AST, and then at runtime provide some relevant information about the name bindings referenced from it. (I haven't even attempted to implement this, although I've suggested it to some of the SciPy folks as an idea they might want to explore to make R style lazy evaluation easier)

By using the prefix+delimiters notation, it would become possible to later have variants that were similarly transparent to the compiler, but *called* a suitably registered callable at compile time to do the conversion to runtime Python objects. For example:

!sh(shell command) !format(format string with implicit interpolation) !sql(SQL query)

So for custom numeric types, you could register:

d = !decimal(1.2) r = !rational(22/7)

But what would that get you?

If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled?

Also, what's the point of it being compile-time? Unless there's some way to write arbitrary code that operates at compile time (like Lisp special forms, or C++ constexpr functions), what code is going to care about the difference between a compile-time decimal value and a run-time decimal value?

Also, where and how do you define sh, decimal, sql, etc.? I'm having a hard time seeing how you have any different options than my proposal does. You could have a function named bang_decimal that's looked up normally, or some way to register_bang_function('decimal', my_decimal_parser), or any of the other options mentioned in this thread, but what's the difference (other than there being a default "no-name" function that does an AST parse and name binding, which doesn't really seem related to any of the non-default examples)?

The larger idea (again, keeping in mind I haven't actually fully thought through how to implement this) is to give the parsers access to the surrounding namespace, which means that the compiler needs to be made aware of any *actual* name references, and the *way* names are referenced would be parser dependent (shell variables, format string interpolation, SQL interpolation, etc).

So, for example:

print(!format(The {item} cost {amount} {units}))

Would roughly translate to:

print("The {item} cost {amount} {units}".format(item=item, amount=amount, units=units))

It seemed relevant in this context, as a compile time AST transformation would let folks define their own pseudo-literals. Since marshal wouldn't know how to handle them, the AST produced at compile time would still need to be for a runtime constructor call rather than for a value to be stored in co_consts. These cases:

d = !decimal(1.2) r = !rational(22/7)

Might simply translate directly to the following as the runtime code:

d = decimal.Decimal("1.2") r = fractions.Fraction(22, 7)

With the difference being that the validity of the passed in string would be checked at compile time rather than at runtime, so you could only use it for literal values, not to construct values from variables.

Note that, as discussed earlier in this thread, it is far easier to accidentally shadow `decimal` than something like `literal_decimal` or `bang_parser_decimal`, so there's a cost to doing this half-way at compile time, not just a benefit. Also, a registry is definitely more "magical" than an explicit import: something some other module imported that isn't even visible in this module has changed the way this module is run, and even compiled. Of course that's true for import hooks as well, but I think in the case of import hooks there's really no avoiding the magic; in this case, there is. Obviously explicit vs. implicit isn't the only factor in usability/readability, so it's possible it would be better anyway, but I'm not sure it is. At any rate, although you haven't shown how you expect these functions to be implemented, I think this proposal ends up being roughly equivalent to mine. Sure, the `bang_parser_decimal` function can compile the source to an AST and look up names in some way, but `literal_decimal` can do that too. And presumably whatever helper functions you were imagining to make that easier could still be written. So it's ultimately just bikeshedding the syntax, and whether you use a registry vs. normal lookup.

...

Paul Moore

2:18 p.m.

On 5 June 2015 at 00:03, Andrew Barnert <abarnert@yahoo.com> wrote:

...

If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled?

Well, Python bytecode has no way of holding any form of constant Decimal value, so if that's what you want you need a change to the bytecode (and hence the interperter). I'm not sure how that qualifies as "user-defined". We seem to be talking at cross purposes here. The questions you're asking are ones I would direct at you (assuming it's you that's after a compile-time value, I'm completely lost as to who is arguing for what any more :-() My position is that "compile-time" user-defined literals don't make sense in Python, what people actually want is probably more along the lines of "better syntax for writing constant values of user-defined types". Oh, and just as a point of reference see http://en.cppreference.com/w/cpp/language/user_literal - C++ user defined literals translate into a *runtime* function call. So even static languages don't work the way you suggest in the comment above. Paul

Andrew Barnert

5:45 p.m.

On Jun 5, 2015, at 05:18, Paul Moore <p.f.moore@gmail.com> wrote:

...

That's the point I was making. Nick proposed this syntax in reply to a message where I said that being a compile-time value is both irrelevant and impossible, so I thought he was claiming that this syntax somehow solved that problem where mine didn't.

...

Be careful of that word "constant". Python doesn't really have a distinction between constant and non-constant values. There are values of immutable and mutable types, and there are read-only attributes and members of immutable collections, but there's no such thing as a constant list value or a non-constant decimal value. So people can't be asking to create constant decimal values when they ask for literal decimal values. So, what does "literal" mean, if it's neither the same thing as "compile-time" nor the same thing as "constant" but just happens to overlap those perfectly in the simplest cases? Well, I think the sense in which these things should "act like literals" is intuitively obvious, but very hard to nail down precisely. Hence the intentionally vague "sufficiently simple" definition I gave. But it doesn't _need_ to be nailed down precisely, because a proposal can be precise, and you can then check it against the cases people intuitively want, and see if they do the right thing. Notice that the C++ committee didn't start out by trying to define "literal" so they could define "user-defined literal"; they started with a vague notion that 1.2d could be a literal in the same sense that 0x1F is, came up with a proposal for that, hashed out that proposal through a series of revisions, translated the proposal into standardese, and then pointed at it and defined "literal" in terms of that. They could have instead decided "You know what, we don't like the term 'literal' for this after all" and called it something different in the final standard, and it still would have served the same needs, and I'm fine if people want to take that tack with Python. A name isn't meaningless, but it's not the most important part of the meaning; the semantics of the feature and the idiomatic uses of it are what matter.

...

No, if you define the operator constexpr, and it returns a value constructed with a constexpr constructor, 1.2d is a compile-time value that can be used in further compile-time computation. That's the point I made earlier in the thread: the notion of "compile-time value" only really makes sense if you have a notion of "compile-time computation"; otherwise, it's irrelevant to any (non-reflective) computation. Therefore, the fact that my proposal leaves that part out of the C++ feature doesn't matter. (Of course Python doesn't quite have _no_ compile-time computation; it has optional constant folding. But if you try to build on top of that without biting the bullet and just declaring the whole language accessible at compile time, you end up with the mess that was C++03, where compile-time code is slow, clumsy, and completely different from runtime code, which is a large part of why we have C++11, and also why we have D and various other languages. I don't think Python should add _anything_ new at compile time. You can always simulate compile time with import time, where the full language is available, so there's no compelling reason to make the same mistake C++ did.)

Paul Moore

5:55 p.m.

On 5 June 2015 at 16:45, Andrew Barnert <abarnert@yahoo.com> wrote:

...

So, what does "literal" mean, if it's neither the same thing as "compile-time" nor the same thing as "constant" but just happens to overlap those perfectly in the simplest cases? Well, I think the sense in which these things should "act like literals" is intuitively obvious, but very hard to nail down precisely. Hence the intentionally vague "sufficiently simple" definition I gave. But it doesn't _need_ to be nailed down precisely, because a proposal can be precise, and you can then check it against the cases people intuitively want, and see if they do the right thing.

OK, my apologies, we're basically agreeing violently, then. IMO, people typically *actually* want a nicer syntax for Decimal values known at source-code-writing time. They probably don't actually really think much about whether the value could be affected by monkeypatching, or runtime changes, because they won't actually do that in practice. So just documenting a clear, sane and suitably Pythonic behaviour should be fine in practice (it won't stop the bikeshedding of course :-)) And "it's the same as Decimal('1.2')" is likely to be sufficiently clear, sane and Pythonic, even if it isn't actually a "literal" in any real sense. That's certainly true for me - I'd be happy with a syntax that worked like that. Paul.

Andrew Barnert

6:13 p.m.

On Jun 5, 2015, at 08:55, Paul Moore <p.f.moore@gmail.com> wrote:

...

Thank you; I think you've just stated exactly my rationale in one paragraph better than all my longer attempts. :) Well, I think it actually _is_ a literal in some useful sense, but I don't see much point in arguing about that. As long as the syntax and semantics are useful, and the name is something I can remember well enough to search for and tell other people about, I'm happy. Anyway, the important question for me is whether people want this for any other type than Decimal (or, really, for decimal64, but unfortunately they don't have that option). That's why I created a hacky implementation, so anyone who thinks they have a good use case for fractions or a custom string type* or whatever can play with it and see if the code actually reads well to themselves and others. If it really is only Decimal that people want, we're better off with something specific rather than general. (* My existing hack doesn't actually handle strings. Once I realized I'd left that out, I was hoping someone would bring it up, so I'd know someone was actually playing with it, at which point I can add it in a one-liner change. But apparently none of the people who downloaded it has actually tried it beyond running the included tests on 1.2d...)

Paul Moore

7:42 p.m.

On 5 June 2015 at 17:13, Andrew Barnert <abarnert@yahoo.com> wrote:

...

Anyway, the important question for me is whether people want this for any other type than Decimal

Personally, I don'tuse decimals enough to care. But I like Nick's generalised version, and I can easily imagine using that for a number of things: unevaluated code objects or SQL snippets, for example. I'd like to be able to use it as a regex literal, as well, but I don't think it lends itself to that (I suspect a bare regex would choke the Python lexer far too much). But yes, the big question is whether it would be used sufficiently to justify the work. And of course, it'd be Python 3.6+ only, so people doing single-source code supporting older versions wouldn't be able to use it for some time anyway. That's a high bar for *any* new syntax, though, not specific to this. Paul

Nick Coghlan

1:31 a.m.

On 6 Jun 2015 01:45, "Andrew Barnert" <abarnert@yahoo.com> wrote:

...

value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled? that problem where mine didn't. I was mainly replying to Paul's angle bracket syntax proposal, not specifically to anything you proposed. The problem I have with your original suggestion is purely syntactic - I don't *want* user-defined syntax to look like language-defined syntax, because it makes it too hard for folks to know where to look things up, and I especially don't want a suffix like "j" to mean "this is a complex literal" while "k" means "this is a different way of spelling a normal function call that accepts a single string argument". I didn't say anything about my preferred syntactic idea *only* being usable for a compile time construct, I just only consider it *interesting* if there's a compile time AST transformation component, as that lets the hook parse a string and break it down into its component parts to make it transparent to the compiler, including giving it the ability to influence the compiler's symbol table construction pass. That extra power above and beyond a normal function call is what would give the construct its rationale for requesting new syntax - it would be a genuinely new capability to integrate "not Python code" with the Python compilation toolchain, rather than an alternate spelling for existing features. I've also been pondering the idea of how you'd notify the compiler of such hooks, since I agree you'd want them declared inline in the module that used them. For that, I think the idea of a "bang import" construct might work, where a module level line of the form "from x import !y" would not only be a normal runtime import of "y", but also allow "!y(implicitly quoted input)" as a compile time construct. There'd still be some tricky questions to resolve from a pragmatic perspective, as you'd likely need a way for the bang import to make additional runtime data available to the rendered AST produced by the bang calls, without polluting the module global namespace, but it might suffice to pass in a cell reference that is then populated at runtime by the bang import step.

...

That confusion is likely at least partly my fault - while this thread provided the name, the bang call concept is one I've been pondering in various forms (most coherently with some of the folks at SciPy last year) since the last time we discussed switch statements (and the related "once" statement), and it goes far beyond just defining pseudo-literals. I brought it up here, because *as a side-effect*, it would provide pseudo-literals by way of compile time constructs that didn't have any variable references in the generated AST (other than constructor references).

...

Updated with the bang import idea to complement the bang calls, my vague notion would actually involve adding two pieces: * a compile time hook that lets you influence both the symbol table pass and the AST generation pass (bang import & bang call working together) * an import time hook that lets you reliably provide required data (like references to type constructors and other functions) to the AST generated in step 1 (probably through bang import populating a cell made available to the corresponding bang call invocations) Cheers, Nick.

Andrew Barnert

9:14 p.m.

On Jun 4, 2015, at 05:08, Steven D'Aprano <steve@pearwood.info> wrote:

...

But this isn't actually true. That BINARY_ADD opcode looks up the addition method at runtime and calls it. And that means that if you monkeypatch complex.__radd__, your method will get called. As an implementation-specific detail, CPython 3.4 doesn't let you modify the complex type. Python allows this, but doesn't require it, and some other implementations do let you modify it. So, if it's important to your code that 1+3j is a "static operation", then your code is non-portable at best. But once again, I suspect that the reason you haven't thought about this is that you've never written any code that actually cares what is or isn't a static operation. It's a typical "consenting adults" case.

...

What you're arguing here, and for the rest of the message, can be summarized in one sentence: the difference between user-defined literals and implementation-defined literals is that the former are user-defined. To which I have no real answer.

...

But if I asked for a turkey and cheese hoagie, and you said I have turkey and cheese and a roll, but that doesn't count as a hoagie by my definition so you can't have it, I'd say just put the turkey and cheese on the roll and call it whatever you want to call it. If people are asking for user-defined literals like 2.3d, and your argument is not that we can't or shouldn't do it, but that the term "user-defined literal" is contradictory, then the answer is the same: just call it something different. I don't know how else to put this. I already said, in two different ways, that if you want to call it something different that's fine. You replied by saying you don't want to argue about the definition of literals, followed by multiple paragraphs arguing about the definition of literals.

...

If you're talking about APL or J, the number of characters might be a relevant measure of simplicity. But in the vast majority of languages, including Python, it has very little relevance. Of course "simple" inherently a vague concept, and it will be different in different languages and contexts. But it's still one of the most important concepts. That's why language design is an art, and why we have a Zen of Python and not an Assembly Manual of Python. Trying to reduce it to something the wc program can measure means reducing it to the point of meaninglessness. Let's give a different example. I could claim that currying makes higher-order expressions simpler. You could rightly point out that it makes the simplest function calls less simple. If we disagree on those points, or on the relative importance of them, we might draw up a bunch of examples to look at the human readability and writability or computer parsability of different expressions, in the context of idiomatic code in the language we were designing. If the rest of the language were a lot like Haskell, we'd probably agree that curried functions were simpler; if it were a lot like Python, we'd probably agree on the reverse. But at no point would the fact that f(1,2) is one character shorter than f(1)(2) come into the discussion. The closest we'd reasonably get might a discussion of the fact that the parens feel "big" and "get in the way" of reading the "more important" parts of the expression, or encourage the reader to naturally partition up the expression in a way that isn't appropriate to the intended meaning, or other such things. (See the "grit on Tim's monitor" appeal.) But those are still vague and subjective things. There's no objective measure to appeal to. Otherwise, every language proposal, Guido would just run the objective simplicity measurement program and it would say yes or no.

...

In the case of C++, a committee actually sat down and hammered out a rigorous definition that codified the intuitive sense they were going for; if you want to read it, you can. But that isn't going to apply to anything but C++. And if you want to argue about it, the place to do so is the C++17 ISO committee. Just declaring that the C++ standard definition of literals doesn't define what you want to call literals doesn't really accomplish anything.

Guido van Rossum

9:49 p.m.

On Thu, Jun 4, 2015 at 12:14 PM, Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:

...

Wrong. You can't moneypatch complex.__radd__. That's a feature of the language. -- --Guido van Rossum (python.org/~guido)

Andrew Barnert

10:18 p.m.

On Jun 4, 2015, at 12:49, Guido van Rossum <guido@python.org> wrote:

...

I may well have missed it, but I went looking through the Built-in Types library documentation, the Data Model and other chapters of the language reference documentation, and every relevant PEP I could think of, and I can't find anything that says this is true. The best I can find is the rationale section for PEP 3119 saying "there are good reasons to keep the built-in types immutable", which is why PEP 3141 was changed to not require mutating the built-in types. But "there are good reasons to allow implementations to forbid it" isn't the same thing as "all implementations must forbid it". And at least some implementations do allow it, like Brython and one of the two embedded pythons. (And the rationale in PEP 3119 doesn't apply to them--Brython doesn't share built-in types between different Python interpreters in different browser windows, even if they're in the same address space.)

Guido van Rossum

11:05 p.m.

OK, you can attribute that to lousy docs. The intention is that builtin types are immutable. On Thu, Jun 4, 2015 at 1:18 PM, Andrew Barnert <abarnert@yahoo.com> wrote:

...

-- --Guido van Rossum (python.org/~guido)

Andrew Barnert

1:20 a.m.

On Jun 4, 2015, at 14:05, Guido van Rossum <guido@python.org> wrote:

...

OK, you can attribute that to lousy docs. The intention is that builtin types are immutable.

I can go file bugs against those other implementations, but first, what's the rationale? The ABC PEP, the numbers PEP discussion, and the type/class unification tutorial all use the same reason: In CPython, different interpreters in the same memory space (as with mod_python) share the same built-in types. From the numbers discussion, it sounds like this was the only reason to reject the idea of just patching float.__bases__. But most other Python implementations don't have process-wide globals like that to worry about; patching int in one interpreter can't possibly affect any other interpreter. "Because CPython can't do it, nobody else should do it, to keep code portable" might be a good enough rationale for something this fundamental, but if that's not the one you're thinking of, I don't want to put those words in your mouth.

...

Guido van Rossum

2:28 a.m.

On Thu, Jun 4, 2015 at 4:20 PM, Andrew Barnert <abarnert@yahoo.com> wrote:

...

Why do you need a better rationale? The builtins are shared between all modules in a way that other things aren't. Nothing good can come from officially recognizing the ability to monkey-patch the builtin types -- it would just lead to paranoia amongst library developers. -- --Guido van Rossum (python.org/~guido)

Ryan Gonzalez

2:50 a.m.

On June 5, 2015 7:28:32 PM CDT, Guido van Rossum <guido@python.org> wrote:

...

Like javascript:void hacks to avoid undefined being re-defined. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.

Chris Angelico

11:23 p.m.

On Fri, Jun 5, 2015 at 6:18 AM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:

...

Huh. Does that imply that Brython has to construct a brand-new integer object for absolutely every operation and constant, in case someone monkeypatched something? Once integers (and other built-in types) lose their immutability, they become distinguishable: x = 2 monkey_patch(x) y = 2 In CPython (and, I think, in the Python spec), the two 2s in x and y will be utterly indistinguishable, like fermions. CPython goes further and uses the exact same object for both 2s, *because it can*. Is there something you can do inside monkey_patch() that will "mark" one of those 2s such that it's somehow different (add an attribute, change a dunder method, etc)? And does Brython guarantee that id(x)!=id(y) because of that? ChrisA

random832＠fastmail.us

11:34 p.m.

On Thu, Jun 4, 2015, at 17:23, Chris Angelico wrote:

...

Er, we're talking about monkey-patching the int *class* (well, the complex class, but the same idea applies), not an individual int object.

Chris Angelico

11:45 p.m.

On Fri, Jun 5, 2015 at 7:34 AM, <random832@fastmail.us> wrote:

...

Ah okay. Even so, it would be very surprising if "1+2" could evaluate to anything other than 3. ChrisA

Andrew Barnert

12:43 a.m.

On Jun 4, 2015, at 14:45, Chris Angelico <rosuav@gmail.com> wrote:

...

It's surprising that int('3') could evaluate to 4, or that print(1+2) could print 4, or that adding today and a 1-day timedelta could give you a date in 1918, or that accessing sys.stdout could play a trumpet sound and then read a 300MB file over the network, but there's nothing in the language stopping you from shadowing or replacing or monkeypatching any of those things, there's just your own common sense, and your trust in the common sense of other people working on the code with you. And, getting this back on point: That also means there would be nothing stopping you from accidentally or maliciously redefining literal_d to play a trumpet sound and then read a 300MB file over the network instead of giving you a Decimal value, but that's not a problem the language has to solve, any more than it's a problem that you can replace int or print or sys.__getattr__. The fact that people might overuse user-defined literals (e.g., I think using it for units, like the _ms suffix that C++'s timing library uses, is a bad idea), that's potentially a real problem. The fact that people might stupidly or maliciously interfere with some-other-user's-defined literals is not. Yes, you can surprise people that way, but Python already gives you a lot of much easier ways to surprise people. Python doesn't have a secure loader or enforced privates and constants or anything of the sort; it's designed to be used by consenting adults, and that works everywhere else, so why wouldn't it work here?

Yury Selivanov

12:13 a.m.

On 2015-06-04 5:23 PM, Chris Angelico wrote:

...

Yury

random832＠fastmail.us

1:02 a.m.

On Thu, Jun 4, 2015, at 18:13, Yury Selivanov wrote:

...

FWIW, numbers (as well as strings) are immutable in JavaScript.

numbers and strings are, but Numbers and Strings aren't. Remember, in Javascript, the former aren't objects.

Yury Selivanov

4:11 a.m.

On 2015-06-04 7:02 PM, random832@fastmail.us wrote:

...

I know. Although you can't mutate the inner-value of Number or String objects, you can only attach properties. Yury

random832＠fastmail.us

2:28 p.m.

On Thu, Jun 4, 2015, at 22:11, Yury Selivanov wrote:

...

I know. Although you can't mutate the inner-value of Number or String objects, you can only attach properties.

You can shadow valueOf, which gets you close enough for many purposes.

Florian Bruhin

June 2015

9:33 p.m.

* Andrew Barnert via Python-ideas <python-ideas@python.org> [2015-06-02 12:03:25 -0700]:

...

This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++.

I actually had the exact same thing in mind recently, and never brought it up because it seemed too crazy to me. It seems I'm not the only one! :D

...

Any number or string token followed by a name (identifier) token is currently illegal. This would change so that, if there's no whitespace between them, it's legal, and equivalent to a call to a function named `literal_{name}({number-or-string})`. For example, `1.2d` becomes `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also becomes `literal_d('1.2')`.

I think a big issue is that it's non-obvious syntactic sugar. You wouldn't expect 1.2x to actually be a function call, and for newcomers this might be rather confusing...

...

Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.

...

I've built a quick&dirty toy implementation (at https://github.com/abarnert/userliteralhack). Unlike the real proposal, this only handles numbers, and allows whitespace between the numbers and the names, and is a terrible hack. But it's enough to play with the idea, and you don't need to patch and recompile CPython to use it.

Andrew Barnert

2:36 a.m.

On Jun 2, 2015, at 12:33, Florian Bruhin <me@the-compiler.org> wrote:

...

* Andrew Barnert via Python-ideas <python-ideas@python.org> [2015-06-02 12:03:25 -0700]:

...
This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++.

I actually had the exact same thing in mind recently, and never brought it up because it seemed too crazy to me. It seems I'm not the only one! :D

...
Any number or string token followed by a name (identifier) token is currently illegal. This would change so that, if there's no whitespace between them, it's legal, and equivalent to a call to a function named `literal_{name}({number-or-string})`. For example, `1.2d` becomes `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also becomes `literal_d('1.2')`.

I think a big issue is that it's non-obvious syntactic sugar. You wouldn't expect 1.2x to actually be a function call, and for newcomers this might be rather confusing...

Well, newcomers won't be creating user-defined literals, so they won't have to even know there's a function call (unless whoever wrote the library that supplies them has a bug).

...

...
Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.

That actually was the use-case I had in mind. I think

{'spam': 1, 'eggs': 2}_o

is less ugly (and less error-prone!) than

OrderedDict([('spam', 1), ('eggs': 2)])

...

Also, it's immediately apparent that it is some kind of dict.

Chris Angelico

9:40 p.m.

On Wed, Jun 3, 2015 at 5:03 AM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:

...

Of course `0x12decimal` becomes `literal_imal('0x12dec')`, and `21jump` becomes `literal_ump('21j'), which are not at all useful, and potentially confusing, but I don't think that would be a serious problem in practice.

There's probably no solution to the literal_imal problem, but the easiest fix for literal_ump is to have 21j be parsed the same way - it's a 21 modified by j, same as 21jump is a 21 modified by jump.

...

Unlike C++, the lookup of that literal function happens at runtime, so `1.2z3` is no longer a SyntaxError, but a NameError on `literal_z3`. Also, this means `literal_d` has to be in scope in every module you want decimal literals in, which often means a `from … import` (or something worse, like monkeypatching builtins). C++ doesn't have that problem because of argument-dependent lookup, but that doesn't work for any other language. I think this is the biggest flaw in the proposal.

...

Also unlike C++, there's no overloading on different kinds of literals; the conversion function has no way of knowing whether the user actually typed a string or a number. This could easily be changed (e.g., by using different names, or just by passing the repr of the string instead of the string itself), but I don't think it's necessary.

...

Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.

I thought there was no such thing as a dict/list/set literal, only display syntax? In any case, that can always be left for a future extension to the proposal. ChrisA

Terry Reedy

2:05 a.m.

On 6/2/2015 3:40 PM, Chris Angelico wrote:

...

On Wed, Jun 3, 2015 at 5:03 AM, Andrew Barnert via Python-ideas

...

Andrew Barnert

2:47 a.m.

On Jun 2, 2015, at 12:40, Chris Angelico <rosuav@gmail.com> wrote:

...

Chris Angelico

3:05 a.m.

On Wed, Jun 3, 2015 at 10:47 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

...

Yeah. The significance is that literals get snapshotted into the code object as constants and simply called up when they're needed, but displays are executable code:

...

Andrew Barnert

June 2015

3:56 a.m.

On Jun 2, 2015, at 18:05, Chris Angelico <rosuav@gmail.com> wrote:

...

Chris Angelico

5:12 a.m.

On Wed, Jun 3, 2015 at 11:56 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

...

On Jun 2, 2015, at 18:05, Chris Angelico <rosuav@gmail.com> wrote:

...
My understanding of "literal" is something which can be processed entirely at compile time, and retained in the code object, just like strings are.

The problem is that Python doesn't really define what it means by "literal" anywhere, and the documentation is not consistent. There are at least two places (not counting tutorial and howtos) that Python 3.4 refers to list or dict literals. (That's not based on a search; someone wrote a StackOverflow question asking what those two places meant.)

Which I don't actually think is much of a problem. It means that in cases like this proposal, you have to be explicit about exactly what you mean by "literal" because Python doesn't do it for you. And it comes up when teaching people about how the parser and compiler work. And... That's about it. You can (as the docs do) loosely use "literal" to include non-comprehension displays in some places but not others, or even to include -2 or 1+2j in some places but not others, and nobody gets confused, except in those special contexts where you're going to have to get into the details anyway.

This is similar to the fact that Python doesn't actually define the semantics of numeric literals anywhere. It's still obvious to anyone what they're supposed to be. The Python docs are a language reference manual, not a rigorous specification, and that's fine.

...

...
Once the code's finished being compiled, there's no record of what type of string literal was used (raw, triple-quoted, etc), only the type of string object (bytes/unicode). Custom literals could be the same

But how? Without magic (like a registry or something similarly not locally visible in the source), how does the compiler know about user-defined literals at compile time? Python (unlike C++) doesn't have an extensible notion of "compile-time computation" to hook into here.

...

And why do you actually care that it happens at compile time? If it's for optimization, that may be premature and irrelevant. (Certainly 1.2d isn't going to be any _worse_ than Decimal('1.2'), it just may not be better.) If it's because you want to reflect on code objects or something, that's not normal end-user code. Why should a normal user ever even know, much less care, whether 1.2d is stored as a constant or an expression in memory or in a .pyc file?

...

...
...
calc_profit("1E2A") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in calc_profit AttributeError: 'int' object has no attribute 'Decimal'

...

...
- come to think of it, it might be nice to have pathlib.Path literals, represented as p"/home/rosuav" or something. In any case, they'd be evaluated using only compile-time information, and would then be saved as constants.

That implies that only immutables should have literal syntaxes. I'm not sure whether that's significant or not.

But pathlib.Path isn't immutable.

Huh, it isn't? That's a pity. In that case, I guess you can't have a path literal. In any case, I'm sure there'll be other string-like things that people can come up with literal syntaxes for. ChrisA

Andrew Barnert

6:55 p.m.

On Jun 2, 2015, at 20:12, Chris Angelico <rosuav@gmail.com> wrote:

...

...
On Wed, Jun 3, 2015 at 11:56 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

...
On Jun 2, 2015, at 18:05, Chris Angelico <rosuav@gmail.com> wrote:

...
...
Once the code's finished being compiled, there's no record of what type of string literal was used (raw, triple-quoted, etc), only the type of string object (bytes/unicode). Custom literals could be the same

But how? Without magic (like a registry or something similarly not locally visible in the source), how does the compiler know about user-defined literals at compile time? Python (unlike C++) doesn't have an extensible notion of "compile-time computation" to hook into here.

Well, an additional parameter to compile() would do it.

I don't understand what you mean. Sure, you can pass the magic registry a separate argument instead of leaving it in the local/global environment, but that doesn't really change anything.

...

I've no idea how hard it is to write an import hook, but my notion was that you could do it that way and alter the behaviour of the compilation process.

...

But I haven't put a lot of thought into implementation, nor do I know enough of the internals to know what's plausible and what isn't.

...
And why do you actually care that it happens at compile time? If it's for optimization, that may be premature and irrelevant. (Certainly 1.2d isn't going to be any _worse_ than Decimal('1.2'), it just may not be better.) If it's because you want to reflect on code objects or something, that's not normal end-user code. Why should a normal user ever even know, much less care, whether 1.2d is stored as a constant or an expression in memory or in a .pyc file?

It's to do with expectations. A literal should simply be itself, nothing else. When you have a string literal in your code, nothing can change what string that represents; at compilation time, it turns into a string object, and there it remains. Shadowing the name 'str' won't affect it. But if something that looks like a literal ends up being a function call, it could get extremely confusing - name lookups happening at run-time when the name doesn't occur in the code. Imagine the traceback:

def calc_profit(hex): decimal = int(hex, 16) return 0.2d * decimal

...
...
...
calc_profit("1E2A") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in calc_profit AttributeError: 'int' object has no attribute 'Decimal'

...

Uhh... what? Sure, I shadowed the module name there, but I'm not *using* the decimal module! I'm just using a decimal literal! It's no problem to shadow the built-in function 'hex' there, because I'm not using the built-in function!

Whatever name you use, there's the possibility that it'll have been changed at run-time, and that will cause no end of confusion. A literal shouldn't cause surprise function calls and name lookups.

...
...
- come to think of it, it might be nice to have pathlib.Path literals, represented as p"/home/rosuav" or something. In any case, they'd be evaluated using only compile-time information, and would then be saved as constants.

That implies that only immutables should have literal syntaxes. I'm not sure whether that's significant or not.

But pathlib.Path isn't immutable.

Huh, it isn't? That's a pity. In that case, I guess you can't have a path literal.

...

In any case, I'm sure there'll be other string-like things that people can come up with literal syntaxes for.

Chris Angelico

11:48 p.m.

On Thu, Jun 4, 2015 at 2:55 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

...

In Python, it's perfectly fine that -2 and 1+2j and (1, 2) are all compiled into expressions, so why isn't it fine that 1.2d is compiled into an expression? And, once you accept that, what's wrong with the expression being `literal_d('1.2')` instead of `Decimal('1.2')`?

...

Andrew Barnert

1:03 a.m.

On Jun 3, 2015, at 14:48, Chris Angelico <rosuav@gmail.com> wrote:

...

...
On Thu, Jun 4, 2015 at 2:55 AM, Andrew Barnert <abarnert@yahoo.com> wrote: In Python, it's perfectly fine that -2 and 1+2j and (1, 2) are all compiled into expressions, so why isn't it fine that 1.2d is compiled into an expression? And, once you accept that, what's wrong with the expression being `literal_d('1.2')` instead of `Decimal('1.2')`?

That's exactly the thing: 1.2d should be atomic. It should not be an expression. The three examples you gave are syntactically expressions, but they act very much like literals thanks to constant folding:

...
...
...
dis.dis(lambda: -2) 1 0 LOAD_CONST 2 (-2) 3 RETURN_VALUE dis.dis(lambda: 1+2j) 1 0 LOAD_CONST 3 ((1+2j)) 3 RETURN_VALUE dis.dis(lambda: (1, 2)) 1 0 LOAD_CONST 3 ((1, 2)) 3 RETURN_VALUE

which means they behave the way people expect them to.

...

There is no way for run-time changes to affect what any of those expressions yields. Whether you're talking about shadowing the name Decimal or the name literal_d, the trouble is that it's happening at run-time. Here's another confusing case:

import decimal from fractionliterals import literal_fr # oops, forgot to import literal_d

# If we miss off literal_fr, we get an immediate error, because # 1/2fr gets evaluated at def time. def do_stuff(x, y, portion=1/2fr): try: result = decimal.Decimal(x*y*portion) except OverflowError: return 0.0d

You won't know that your literal has failed until something actually triggers the error.

...

That is extremely unobvious, especially since the token "literal_d" doesn't occur anywhere in do_stuff().

...

Literals look like atoms, and if they behave like expressions, sooner or later there'll be a ton of Stack Overflow questions saying "Why doesn't my code work? I just changed this up here, and now I get this weird error".

Chris Angelico

1:40 a.m.

On Thu, Jun 4, 2015 at 9:03 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

...

Can you come up with an actual example where changing this up here gives this weird error somewhere else? If not, I doubt even the intrepid noobs of StackOverflow will come up with one.

Neither of the examples so far qualifies--the first one is an error that the design can never produce, and the second one is not weird or confusing any more than any other error in any dynamic languages.

Anything that causes a different code path to be executed can do this. ChrisA

Andrew Barnert

June 2015

3:03 a.m.

On Jun 3, 2015, at 16:40, Chris Angelico <rosuav@gmail.com> wrote:

...

Terry Reedy

6:48 a.m.

On 6/2/2015 9:56 PM, Andrew Barnert via Python-ideas wrote:

...

The problem is that Python doesn't really define what it means by "literal" anywhere,

...

and the documentation is not consistent.

I'd call it a bit sloppy in places.

...

There are at least two places (not counting tutorial and howtos) that Python 3.4 refers to list or dict literals. (That's not based on a search; someone wrote a StackOverflow question asking what those two places meant.)

Please open a tracker issue to correct the sloppiness and reference the SO issue as evidence that it confuses people.

...

Which I don't actually think is much of a problem. It means that in cases like this proposal, you have to be explicit about exactly what you mean by "literal" because Python doesn't do it for you.

...

This is similar to the fact that Python doesn't actually define the semantics of numeric literals anywhere.

Andrew Barnert

8:26 p.m.

I think this is off-topic, but it's important enough to answer anyway. On Jun 2, 2015, at 21:48, Terry Reedy <tjreedy@udel.edu> wrote:

...

...
On 6/2/2015 9:56 PM, Andrew Barnert via Python-ideas wrote:

The problem is that Python doesn't really define what it means by "literal" anywhere,

The reference manual seems quite definite to me. The definitive section is "Section 2.4. Literals". I should have all the information needed to write a new implementation.

No, that defines what literals mean for the purpose of lexical analysis.

...

It starts "Literals are notations for constant values of some built-in types."

...

...
and the documentation is not consistent.

I'd call it a bit sloppy in places.

I wouldn't call it sloppy. I'd call it somewhat loose and informal in places, but that's often a good thing.

...

...
There are at least two places (not counting tutorial and howtos) that Python 3.4 refers to list or dict literals. (That's not based on a search; someone wrote a StackOverflow question asking what those two places meant.)

Please open a tracker issue to correct the sloppiness and reference the SO issue as evidence that it confuses people.

...

...
This is similar to the fact that Python doesn't actually define the semantics of numeric literals anywhere.

I am again puzzled by your claim. There are 3 builtin number classes: int, float, and complex. There are 3 type of numeric literals: integer, float, and imaginary. "An imaginary literal yields a complex number with a real part of 0.0." Anyone capable of programming Python should be able to match 'integer' with 'int' and 'float' with 'float.

Nathaniel Smith

9:40 p.m.

On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:

...

This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++.

In the thread on decimals, a number of people suggested that they'd like to have decimal literals. Nick Coghlan explained why decimal.Decimal literals don't make sense in general (primarily, but not solely, because they're inherently context-sensitive), so unless we first add a fixed type like decimal64, that idea is a non-starter. However, there was some interest in either having Swift-style convertible literals or C++-style user-defined literals. Either one would allow users who want decimal literals for a particular app where it makes sense (because there's a single fixed context, and the performance cost of Decimal('1.2') vs. a real constant is irrelevant) to add them without too much hassle or hackery.

Are there any use cases besides decimals? Wouldn't it be easier to just add, say, a fixed "0d" prefix for decimals? 0x1001 # hex 0b1001 # binary 0d1.001 # decimal

...

Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.

Chris Kaynor

10:30 p.m.

On Tue, Jun 2, 2015 at 12:40 PM, Nathaniel Smith <njs@pobox.com> wrote:

...

Nathaniel Smith

11:26 p.m.

On Jun 2, 2015 1:32 PM, "Chris Kaynor" <ckaynor@zindagigames.com> wrote:

...

Andrew Barnert

June 2015

3:35 a.m.

On Jun 2, 2015, at 12:40, Nathaniel Smith <njs@pobox.com> wrote:

...

On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:

...
This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++.

In the thread on decimals, a number of people suggested that they'd like to have decimal literals. Nick Coghlan explained why decimal.Decimal literals don't make sense in general (primarily, but not solely, because they're inherently context-sensitive), so unless we first add a fixed type like decimal64, that idea is a non-starter. However, there was some interest in either having Swift-style convertible literals or C++-style user-defined literals. Either one would allow users who want decimal literals for a particular app where it makes sense (because there's a single fixed context, and the performance cost of Decimal('1.2') vs. a real constant is irrelevant) to add them without too much hassle or hackery.

Are there any use cases besides decimals?

...

Wouldn't it be easier to just add, say, a fixed "0d" prefix for decimals?

...

0x1001 # hex 0b1001 # binary 0d1.001 # decimal

...
Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.

Also there's the idea floating around of making *all* dicts ordered (as PyPy has done), which would be much cleaner if it can be managed, so I'm guessing that would have to be tried and fail before any new syntax would be added for this use case.

Bruce Leban

3:50 a.m.

On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:

...

Andrew Barnert

5:57 a.m.

On Jun 2, 2015, at 18:50, Bruce Leban <bruce@leban.us> wrote:

...

...
On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote: Any number or string token followed by a name (identifier) token is currently illegal. This would change so that, if there's no whitespace between them, it's legal, and equivalent to a call to a function named `literal_{name}({number-or-string})`. For example, `1.2d` becomes `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also becomes `literal_d('1.2')`.

Of course `0x12decimal` becomes `literal_imal('0x12dec')`, and `21jump` becomes `literal_ump('21j'), which are not at all useful, and potentially confusing, but I don't think that would be a serious problem in practice.

You seem to suggest that the token should start with an underscore when you write 1.2_dec and {...}_o but not when you write 1.2d and 1.2jump.

...

I do think the namescape thing is an issue but requiring me to write

from literals import literal_jump

isn't necessarily that bad. Without an explicit import, how would I go about tracking down what exactly 21_jump means?

Thanks; that's the argument I was trying to make and not making very well.

...

The use of _o on a dict is strange since the thing you're attaching it to isn't a literal. I think there needs to be some more thought here if you want to apply it to anything other than a simple value:

Steven D'Aprano

4:52 a.m.

On Tue, Jun 02, 2015 at 12:03:25PM -0700, Andrew Barnert via Python-ideas wrote:

...

Andrew Barnert

9:43 p.m.

On Jun 2, 2015, at 19:52, Steven D'Aprano <steve@pearwood.info> wrote:

...

...
On Tue, Jun 02, 2015 at 12:03:25PM -0700, Andrew Barnert via Python-ideas wrote:

I explored the convertible literals a while ago, and I'm pretty sure that doesn't work in a duck-typed language. But the C++ design does work, as long as you're willing to have the conversion (including the lookup of the conversion function itself) done at runtime.

I'm torn. On the one hand, some sort of extensible syntax for literals would be nice. I say "nice" rather than useful because there are advantages and disadvantages and there's no way of really knowing which outweighs the other.

...

But, really, your proposal is in no way, shape or form syntax for *literals*,

...

it's a new syntax for an unary postfix operator

That's fair; C++ in fact defines its user literal syntax in terms of special constexpr operator overloads, and points out the similarities to postfix operator++ in a note.

...

or function. The whole point of something being a literal is that it is parsed and converted at compile time. Now you might (and do) say that worrying about this is "premature optimization", but call me a pedant if you like, I don't think we should call something a literal if it's a runtime function call.

...

Otherwise, we might as well say that

from fractions import Fraction Fraction(2)

is a literal, in which case I can say your proposal is unnecessary as we already have user-specified literals in Python.

...

I can think of some interesting uses for postfix operators, or literals, or whatever we want to call them:

45° 10!! 23.5d 3d6 35'24" 15ell

I've deliberately not explained what I mean by each of them. You can probably guess some, or all, but I hope it demonstrates one problem with this suggestion. Like operator overloading, it risks making code less clear rather than more.

Steven D'Aprano

2:08 p.m.

On Wed, Jun 03, 2015 at 12:43:00PM -0700, Andrew Barnert wrote:

...

On Jun 2, 2015, at 19:52, Steven D'Aprano <steve@pearwood.info> wrote: [...]

...
But, really, your proposal is in no way, shape or form syntax for *literals*,

It's a syntax for things that are somewhat like `2`, more like `-2`, even more like `(2,)`, but still not exactly the same as even that.

...

If you don't like using the word "literal" for that, you can come up with a different word. I called it a "literal" because "user-defined literals" is what people were asking for when they asked for `2.3d`,

...

A literal is a notation for expressing some value that means what it says in a sufficiently simple way.

...

...
Otherwise, we might as well say that

from fractions import Fraction Fraction(2)

is a literal, in which case I can say your proposal is unnecessary as we already have user-specified literals in Python.

In C++, a constructor expression like Fraction(2) may be evaluable at compile time, and may evaluate to something that's constant at both compile time and runtime, and yet it's still not a literal. Why? Because their rule for what counts as "sufficiently simple" includes constexpr postfix user-literal operators, but not constexpr function or constructor calls.

Paul Moore

June 2015

3:06 p.m.

On 4 June 2015 at 13:08, Steven D'Aprano <steve@pearwood.info> wrote:

...

Nick Coghlan

3:48 p.m.

On 4 June 2015 at 23:06, Paul Moore <p.f.moore@gmail.com> wrote:

...

Paul Moore

4:25 p.m.

On 4 June 2015 at 14:48, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Nick Coghlan

12:31 a.m.

On 5 Jun 2015 00:25, "Paul Moore" <p.f.moore@gmail.com> wrote:

...

Paul Moore

2:09 p.m.

On 4 June 2015 at 23:31, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Petr Viktorin

2:30 p.m.

On Fri, Jun 5, 2015 at 2:09 PM, Paul Moore <p.f.moore@gmail.com> wrote:

...

Andrew Barnert

June 2015

1:03 a.m.

On Jun 4, 2015, at 06:48, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Nick Coghlan

9:06 a.m.

On 5 June 2015 at 09:03, Andrew Barnert <abarnert@yahoo.com> wrote:

...

On Jun 4, 2015, at 06:48, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
...
On 4 June 2015 at 23:06, Paul Moore <p.f.moore@gmail.com> wrote: As a straw man how about a new syntax (this won't work as written, because it'll clash with the "<" operator, but the basic idea works):

LITERAL_CALL = PRIMARY "<" <any source character except right angle bracket>* ">"

The main idea I've had for compile time metaprogramming that I figured I might be able to persuade Guido not to hate is:

python_ast, names2cells, unbound_names = !(this_is_an_arbitrary_python_expression)

As suggested by the assignment target names, the default behaviour would be to compile the expression to a Python AST, and then at runtime provide some relevant information about the name bindings referenced from it. (I haven't even attempted to implement this, although I've suggested it to some of the SciPy folks as an idea they might want to explore to make R style lazy evaluation easier)

By using the prefix+delimiters notation, it would become possible to later have variants that were similarly transparent to the compiler, but *called* a suitably registered callable at compile time to do the conversion to runtime Python objects. For example:

!sh(shell command) !format(format string with implicit interpolation) !sql(SQL query)

So for custom numeric types, you could register:

d = !decimal(1.2) r = !rational(22/7)

But what would that get you?

If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled?

Also, what's the point of it being compile-time? Unless there's some way to write arbitrary code that operates at compile time (like Lisp special forms, or C++ constexpr functions), what code is going to care about the difference between a compile-time decimal value and a run-time decimal value?

Also, where and how do you define sh, decimal, sql, etc.? I'm having a hard time seeing how you have any different options than my proposal does. You could have a function named bang_decimal that's looked up normally, or some way to register_bang_function('decimal', my_decimal_parser), or any of the other options mentioned in this thread, but what's the difference (other than there being a default "no-name" function that does an AST parse and name binding, which doesn't really seem related to any of the non-default examples)?

Andrew Barnert

10:47 a.m.

On Jun 5, 2015, at 00:06, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

...
On 5 June 2015 at 09:03, Andrew Barnert <abarnert@yahoo.com> wrote:

...
On Jun 4, 2015, at 06:48, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
On 4 June 2015 at 23:06, Paul Moore <p.f.moore@gmail.com> wrote: As a straw man how about a new syntax (this won't work as written, because it'll clash with the "<" operator, but the basic idea works):

LITERAL_CALL = PRIMARY "<" <any source character except right angle bracket>* ">"

The main idea I've had for compile time metaprogramming that I figured I might be able to persuade Guido not to hate is:

python_ast, names2cells, unbound_names = !(this_is_an_arbitrary_python_expression)

As suggested by the assignment target names, the default behaviour would be to compile the expression to a Python AST, and then at runtime provide some relevant information about the name bindings referenced from it. (I haven't even attempted to implement this, although I've suggested it to some of the SciPy folks as an idea they might want to explore to make R style lazy evaluation easier)

By using the prefix+delimiters notation, it would become possible to later have variants that were similarly transparent to the compiler, but *called* a suitably registered callable at compile time to do the conversion to runtime Python objects. For example:

!sh(shell command) !format(format string with implicit interpolation) !sql(SQL query)

So for custom numeric types, you could register:

d = !decimal(1.2) r = !rational(22/7)

But what would that get you?

If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled?

Also, what's the point of it being compile-time? Unless there's some way to write arbitrary code that operates at compile time (like Lisp special forms, or C++ constexpr functions), what code is going to care about the difference between a compile-time decimal value and a run-time decimal value?

Also, where and how do you define sh, decimal, sql, etc.? I'm having a hard time seeing how you have any different options than my proposal does. You could have a function named bang_decimal that's looked up normally, or some way to register_bang_function('decimal', my_decimal_parser), or any of the other options mentioned in this thread, but what's the difference (other than there being a default "no-name" function that does an AST parse and name binding, which doesn't really seem related to any of the non-default examples)?

The larger idea (again, keeping in mind I haven't actually fully thought through how to implement this) is to give the parsers access to the surrounding namespace, which means that the compiler needs to be made aware of any *actual* name references, and the *way* names are referenced would be parser dependent (shell variables, format string interpolation, SQL interpolation, etc).

So, for example:

print(!format(The {item} cost {amount} {units}))

Would roughly translate to:

print("The {item} cost {amount} {units}".format(item=item, amount=amount, units=units))

It seemed relevant in this context, as a compile time AST transformation would let folks define their own pseudo-literals. Since marshal wouldn't know how to handle them, the AST produced at compile time would still need to be for a runtime constructor call rather than for a value to be stored in co_consts. These cases:

d = !decimal(1.2) r = !rational(22/7)

Might simply translate directly to the following as the runtime code:

d = decimal.Decimal("1.2") r = fractions.Fraction(22, 7)

With the difference being that the validity of the passed in string would be checked at compile time rather than at runtime, so you could only use it for literal values, not to construct values from variables.

...

Paul Moore

2:18 p.m.

On 5 June 2015 at 00:03, Andrew Barnert <abarnert@yahoo.com> wrote:

...

If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled?

Andrew Barnert

5:45 p.m.

On Jun 5, 2015, at 05:18, Paul Moore <p.f.moore@gmail.com> wrote:

...

Paul Moore

5:55 p.m.

On 5 June 2015 at 16:45, Andrew Barnert <abarnert@yahoo.com> wrote:

...

So, what does "literal" mean, if it's neither the same thing as "compile-time" nor the same thing as "constant" but just happens to overlap those perfectly in the simplest cases? Well, I think the sense in which these things should "act like literals" is intuitively obvious, but very hard to nail down precisely. Hence the intentionally vague "sufficiently simple" definition I gave. But it doesn't _need_ to be nailed down precisely, because a proposal can be precise, and you can then check it against the cases people intuitively want, and see if they do the right thing.

Andrew Barnert

June 2015

6:13 p.m.

On Jun 5, 2015, at 08:55, Paul Moore <p.f.moore@gmail.com> wrote:

...

Paul Moore

7:42 p.m.

On 5 June 2015 at 17:13, Andrew Barnert <abarnert@yahoo.com> wrote:

...

Anyway, the important question for me is whether people want this for any other type than Decimal

Nick Coghlan

1:31 a.m.

On 6 Jun 2015 01:45, "Andrew Barnert" <abarnert@yahoo.com> wrote:

...

Andrew Barnert

9:14 p.m.

On Jun 4, 2015, at 05:08, Steven D'Aprano <steve@pearwood.info> wrote:

...

Guido van Rossum

9:49 p.m.

On Thu, Jun 4, 2015 at 12:14 PM, Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:

...

Wrong. You can't moneypatch complex.__radd__. That's a feature of the language. -- --Guido van Rossum (python.org/~guido)

Andrew Barnert

10:18 p.m.

On Jun 4, 2015, at 12:49, Guido van Rossum <guido@python.org> wrote:

...

Guido van Rossum

June 2015

11:05 p.m.

OK, you can attribute that to lousy docs. The intention is that builtin types are immutable. On Thu, Jun 4, 2015 at 1:18 PM, Andrew Barnert <abarnert@yahoo.com> wrote:

...

-- --Guido van Rossum (python.org/~guido)