[Python-ideas] User-defined literals

Wed Jun 3 05:12:52 CEST 2015

On Wed, Jun 3, 2015 at 11:56 AM, Andrew Barnert <abarnert at yahoo.com> wrote:
> On Jun 2, 2015, at 18:05, Chris Angelico <rosuav at gmail.com> wrote:
>> My understanding of "literal" is something which can be processed
>> entirely at compile time, and retained in the code object, just like
>> strings are.
>
> The problem is that Python doesn't really define what it means by "literal" anywhere, and the documentation is not consistent. There are at least two places (not counting tutorial and howtos) that Python 3.4 refers to list or dict literals. (That's not based on a search; someone wrote a StackOverflow question asking what those two places meant.)
>
> Which I don't actually think is much of a problem. It means that in cases like this proposal, you have to be explicit about exactly what you mean by "literal" because Python doesn't do it for you. And it comes up when teaching people about how the parser and compiler work. And... That's about it. You can (as the docs do) loosely use "literal" to include non-comprehension displays in some places but not others, or even to include -2 or 1+2j in some places but not others, and nobody gets confused, except in those special contexts where you're going to have to get into the details anyway.
>
> This is similar to the fact that Python doesn't actually define the semantics of numeric literals anywhere. It's still obvious to anyone what they're supposed to be. The Python docs are a language reference manual, not a rigorous specification, and that's fine.
>

Yes, it's a bit tricky. Part of the confusion comes from the peephole
optimizer; "1+2j" looks like a constant, but it's actually a
compile-time expression. It wouldn't be a big problem to have an
uber-specific definition of "literal" that cuts out things like that;
for the most part, it's not going to be a problem (eg if you define a
fractions.Fraction literal, you could use "1/2frac" or "1frac/2" and
you'd get back Fraction(1, 2) either way, simply because division of
Fraction and int works correctly; you could even have a "mixed number
literal" like "1+1/2frac" and it'd evaluate just fine).

>> Once the code's finished being compiled, there's no
>> record of what type of string literal was used (raw, triple-quoted,
>> etc), only the type of string object (bytes/unicode). Custom literals
>> could be the same
>
> But how? Without magic (like a registry or something similarly not locally visible in the source), how does the compiler know about user-defined literals at compile time? Python (unlike C++) doesn't have an extensible notion of "compile-time computation" to hook into here.
>

Well, an additional parameter to compile() would do it. I've no idea
how hard it is to write an import hook, but my notion was that you
could do it that way and alter the behaviour of the compilation
process. But I haven't put a lot of thought into implementation, nor
do I know enough of the internals to know what's plausible and what
isn't.

> And why do you actually care that it happens at compile time? If it's for optimization, that may be premature and irrelevant. (Certainly 1.2d isn't going to be any _worse_ than Decimal('1.2'), it just may not be better.) If it's because you want to reflect on code objects or something, that's not normal end-user code. Why should a normal user ever even know, much less care, whether 1.2d is stored as a constant or an expression in memory or in a .pyc file?
>

It's to do with expectations. A literal should simply be itself,
nothing else. When you have a string literal in your code, nothing can
change what string that represents; at compilation time, it turns into
a string object, and there it remains. Shadowing the name 'str' won't
affect it. But if something that looks like a literal ends up being a
function call, it could get extremely confusing - name lookups
happening at run-time when the name doesn't occur in the code. Imagine
the traceback:

def calc_profit(hex):
    decimal = int(hex, 16)
    return 0.2d * decimal

>>> calc_profit("1E2A")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in calc_profit
AttributeError: 'int' object has no attribute 'Decimal'

Uhh... what? Sure, I shadowed the module name there, but I'm not
*using* the decimal module! I'm just using a decimal literal! It's no
problem to shadow the built-in function 'hex' there, because I'm not
using the built-in function!

Whatever name you use, there's the possibility that it'll have been
changed at run-time, and that will cause no end of confusion. A
literal shouldn't cause surprise function calls and name lookups.

>> - come to think of it, it might be nice to have
>> pathlib.Path literals, represented as p"/home/rosuav" or something. In
>> any case, they'd be evaluated using only compile-time information, and
>> would then be saved as constants.
>>
>> That implies that only immutables should have literal syntaxes. I'm
>> not sure whether that's significant or not.
>
> But pathlib.Path isn't immutable.

Huh, it isn't? That's a pity. In that case, I guess you can't have a
path literal. In any case, I'm sure there'll be other string-like
things that people can come up with literal syntaxes for.

ChrisA