[Python-ideas] User-defined literals

Thu Jun 4 14:08:36 CEST 2015

On Wed, Jun 03, 2015 at 12:43:00PM -0700, Andrew Barnert wrote:
> On Jun 2, 2015, at 19:52, Steven D'Aprano <steve at pearwood.info> wrote:
[...]
> > But, really, your proposal is in no way, shape or form syntax for 
> > *literals*,
> 
> It's a syntax for things that are somewhat like `2`, more like `-2`, 
> even more like `(2,)`, but still not exactly the same as even that.

Not really. It's a syntax for something that is not very close to *any* 
of those examples. Unlike all of those example, it is a syntax for 
calling a function at runtime.

Let's take (-2, 1+3j) as an example. As you point out in another post, 
Python may constant-fold it, but isn't required to. Python 3.3 compiles 
it to a single constant:

  LOAD_CONST               6 ((-2, (1+3j)))

but Python 1.5 compiles it to a series of byte-code operations:

  LOAD_CONST          0 (2)
  UNARY_NEGATIVE
  LOAD_CONST          1 (1)
  LOAD_CONST          2 (3j)
  BINARY_ADD
  BUILD_TUPLE         2

But that's just implementation detail. Whether Python 3.3 or 1.5, both 
expressions have something in common: the *operation* is immutable (I 
don't mean the object itself); there is nothing you can do, from pure 
python code, to make the literal (-2, 1+3j) something other than a 
two-tuple consisting of -2 and 1+3j. You can shadow int, complex and 
tuple, and it won't make a lick of difference. For lack of a better 
term, I'm going to call this a "static operation" (as opposed to dynamic 
operations like called len(x), which can be shadowed or monkey-patched).

I don't wish to debate the definition of "literal", as that may be very 
difficult. For example, is 2+3j actually a literal, or an expression 
containing only literals? If a literal, how about 2*3**4/5 for that 
matter? As soon as Python compilers start doing compile-time constant 
folding, the boundary between literals and constant expressions becomes 
fuzzy. But that boundary is actually not very interesting. What is 
interesting is that every literal shares at least the property that I 
refer to above, that you cannot redefine the result of that literal at 
runtime by shadowing or monkey-patching.

Coming from that perspective, a literal *defined* at runtime as you 
suggest is a contradiction in terms. I don't care so much if the actual 
operation that evaluates the literal happens at runtime, so long as it 
is static in the above sense. If it's dynamic, then it's not a literal, 
it's just a function call with ugly syntax.

> If 
> you don't like using the word "literal" for that, you can come up with 
> a different word. I called it a "literal" because "user-defined 
> literals" is what people were asking for when they asked for `2.3d`, 

If you asked for a turkey and cheese sandwich on rye bread, and I said 
"Well, I haven't got any turkey, or rye, but I can give you a slice of 
cheese on white bread and we'll just call it a turkey and cheese rye 
sandwich", you probably wouldn't be impressed :-)

> A literal is a notation for expressing some value that means what it 
> says in a sufficiently simple way.

I don't think that works. "Sufficiently simple" is a problematic 
concept. If "123_d" is sufficiently simply, surely "d(123)" is equally 
simple? It's only one character more, and it's a much more familiar 
and conventional syntax.

Especially since *_d ends up calling a function, which might as well be 
called d(). And if it is called d, why not a more_meaningful_name() 
instead? I would hope that the length of the function name is not the 
defining characteristic of "sufficiently simple"? (Consider 
123_somereallylongbutmeaningfulnamehere.)

I don't wish to argue about other languages, but I think for Python, the 
important characteristic of "literals" is that they are static, as 
above, not "simple". An expression with nested containers isn't 
necessarily simple:

    {0: [1, 2, {3, 4, (5, 6)}]}  # add arbitrary levels of complexity

nor is it necessarily constructed as a compile-time constant, but it is 
static in the above sense. 

[...]
> > Otherwise, we might as well say that 
> > 
> >    from fractions import Fraction
> >    Fraction(2)
> > 
> > is a literal, in which case I can say your proposal is unnecessary as we 
> > already have user-specified literals in Python.
> 
> In C++, a constructor expression like Fraction(2) may be evaluable at 
> compile time, and may evaluate to something that's constant at both 
> compile time and runtime, and yet it's still not a literal. Why? 
> Because their rule for what counts as "sufficiently simple" includes 
> constexpr postfix user-literal operators, but not constexpr function 
> or constructor calls.

What is the logic for that rule? If it is just an arbitrary decision 
that "literals cannot include parentheses" then I equally arbitrarily 
dismiss that rule and say "of course they can, the C++ standard not 
withstanding, and the fact that Fraction(2) is a constant evaluated at 
compile time is proof of that fact".

In any case, this is Python, and arguing over definitions from C++ is 
not productive. Our understanding of what makes a literal can be 
informed by other languages, but cannot be defined by other languages -- 
if for no other reason that other languages may not all agree on what 
is and isn't a literal.

-- 
Steve