[Python-ideas] User-defined literals

Andrew Barnert abarnert at yahoo.com
Wed Jun 3 03:56:13 CEST 2015


On Jun 2, 2015, at 18:05, Chris Angelico <rosuav at gmail.com> wrote:
> 
> On Wed, Jun 3, 2015 at 10:47 AM, Andrew Barnert <abarnert at yahoo.com> wrote:
>>>> Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.
>>> 
>>> I thought there was no such thing as a dict/list/set literal, only
>>> display syntax?
>> 
>> That's what I meant in the last sentence: technically, there's no such thing as a dict literal, just a dict display that isn't a comprehension. I don't think you want user-defined suffixes on comprehensions, and coming up with a principled and simply-implementable way to make them work on literal-type displays but not comprehension-type displays doesn't seem like an easy problem.
> 
> Yeah. The significance is that literals get snapshotted into the code
> object as constants and simply called up when they're needed, but
> displays are executable code:
> 
>>>> dis.dis(lambda: "Literal")
>  1           0 LOAD_CONST               1 ('Literal')
>              3 RETURN_VALUE
>>>> dis.dis(lambda: ["List","Display"])
>  1           0 LOAD_CONST               1 ('List')
>              3 LOAD_CONST               2 ('Display')
>              6 BUILD_LIST               2
>              9 RETURN_VALUE
>>>> dis.dis(lambda: ("Tuple","Literal"))
>  1           0 LOAD_CONST               3 (('Tuple', 'Literal'))
>              3 RETURN_VALUE
> 
> My understanding of "literal" is something which can be processed
> entirely at compile time, and retained in the code object, just like
> strings are.

The problem is that Python doesn't really define what it means by "literal" anywhere, and the documentation is not consistent. There are at least two places (not counting tutorial and howtos) that Python 3.4 refers to list or dict literals. (That's not based on a search; someone wrote a StackOverflow question asking what those two places meant.)

Which I don't actually think is much of a problem. It means that in cases like this proposal, you have to be explicit about exactly what you mean by "literal" because Python doesn't do it for you. And it comes up when teaching people about how the parser and compiler work. And... That's about it. You can (as the docs do) loosely use "literal" to include non-comprehension displays in some places but not others, or even to include -2 or 1+2j in some places but not others, and nobody gets confused, except in those special contexts where you're going to have to get into the details anyway.

This is similar to the fact that Python doesn't actually define the semantics of numeric literals anywhere. It's still obvious to anyone what they're supposed to be. The Python docs are a language reference manual, not a rigorous specification, and that's fine.

> Once the code's finished being compiled, there's no
> record of what type of string literal was used (raw, triple-quoted,
> etc), only the type of string object (bytes/unicode). Custom literals
> could be the same

But how? Without magic (like a registry or something similarly not locally visible in the source), how does the compiler know about user-defined literals at compile time? Python (unlike C++) doesn't have an extensible notion of "compile-time computation" to hook into here.

And why do you actually care that it happens at compile time? If it's for optimization, that may be premature and irrelevant. (Certainly 1.2d isn't going to be any _worse_ than Decimal('1.2'), it just may not be better.) If it's because you want to reflect on code objects or something, that's not normal end-user code. Why should a normal user ever even know, much less care, whether 1.2d is stored as a constant or an expression in memory or in a .pyc file?

> - come to think of it, it might be nice to have
> pathlib.Path literals, represented as p"/home/rosuav" or something. In
> any case, they'd be evaluated using only compile-time information, and
> would then be saved as constants.
> 
> That implies that only immutables should have literal syntaxes. I'm
> not sure whether that's significant or not.

But pathlib.Path isn't immutable.

Meanwhile, that reminds me: one of the frequent selling points for Swift's related feature is for NSURL literals (which Cocoa uses for local paths as well as remote resources); I should go through the Swift selling points to see if they've found other things that the C++ community hasn't (but that can be ported to the C++ design, and that don't depend on peculiarities of Cocoa to be interesting).


More information about the Python-ideas mailing list