[Python-ideas] User-defined literals

Wed Jun 3 20:26:54 CEST 2015

I think this is off-topic, but it's important enough to answer anyway.

On Jun 2, 2015, at 21:48, Terry Reedy <tjreedy at udel.edu> wrote:
> 
>> On 6/2/2015 9:56 PM, Andrew Barnert via Python-ideas wrote:
>> 
>> The problem is that Python doesn't really define what it means by
>> "literal" anywhere,
> 
> The reference manual seems quite definite to me. The definitive section is "Section 2.4. Literals". I should have all the information needed to write a new implementation.

No, that defines what literals mean for the purpose of lexical analysis.

> It starts "Literals are notations for constant values of some built-in types."

By the rules in this section, ..., None, True, and False are not literals, even though they are called literals everywhere else they appear in the documentation except for the Lexical Analysis chapter. In fact, even within that chapter, in 2.6 Delimiters, it explains that "A sequence of three periods has a special meaning as an ellipsis literal."

By the rules in this section, "-2" is not a literal, even though, e.g., in the data model section it says "co_consts is a tuple containing the literals used by the bytecode", and in every extant Python implementation -2 will be stored in co_consts.

By the rules in this section, "()" and "{}" are not literals, even though, e.g., in the set displays section it says "An empty set cannot be constructed with {}; this literal constructs an empty dictionary."

And so on.

And that's fine. None of those things are literals for the purpose of lexical analysis, even though they are things that represent literal values.

And using the word "literal" somewhat loosely isn't confusing anywhere. Where a more specific definition is needed, as when documenting the lexical analysis phase of the language, a specific definition is given.

And this is what allows ast.literal_eval to refer to "the following Python literal structures: strings, bytes, numbers, tuples, dicts, sets, booleans, and None" instead of having to say "the following Python literal structures: strings, bytes, and numbers; the negation of a literal number; the addition or subtraction of a non-imaginary literal number and an imaginary literal number; expression lists containing at least one comma; empty parentheses; the following container displays when not containing comprehensions: lists, dicts, sets; the keywords True, False, and None".

I don't think that's a bad thing. If you want to know what the "literal structure... None" means, it's easy to find out, and the fact that None is tokenized as a keyword rather than as a literal does not hamper you in any way. If you actually need to write a tokenizer, then the fact that None is tokenized as a keyword makes a difference--and you can find that out easily as well.

> > and the documentation is not consistent.
> 
> I'd call it a bit sloppy in places.

I wouldn't call it sloppy. I'd call it somewhat loose and informal in places, but that's often a good thing.

>> There
>> are at least two places (not counting tutorial and howtos) that
>> Python 3.4 refers to list or dict literals. (That's not based on a
>> search; someone wrote a StackOverflow question asking what those two
>> places meant.)
> 
> Please open a tracker issue to correct the sloppiness and reference the SO issue as evidence that it confuses people.

But it doesn't confuse people in any relevant way.

The user who asked that question had no problem figuring out how to interpret code that includes a (), or even how that code should be and is compiled. He could have written a Python interpreter with the knowledge he had. Maybe he couldn't have written a specification, but who cares? He doesn't need to.

>> This is similar to the fact that Python doesn't actually define the
>> semantics of numeric literals anywhere.
> 
> I am again puzzled by your claim. There are 3 builtin number classes: int, float, and complex.  There are 3 type of numeric literals: integer, float, and imaginary. "An imaginary literal yields a complex number with a real part of 0.0."  Anyone capable of programming Python should be able to match 'integer' with 'int' and 'float' with 'float.

Yes, and they should also be able to tell that the integer literal "42" should evaluate to an int whose value is equal to 42, and that "the value may be approximated in the case of floating point" means that the literal "1.2" should evaluate to the float whose value is closest to 1.2 rather than some different approximation, and so on.

But the documentation doesn't actually define any of that. It doesn't have to, because it assumes it's being read by a non-idiot who's capable of programming Python (and won't deliberately make stupid decisions in interpreting it just because he's technically allowed to).

The C++ specification defines all of that, and more (that the digits are interpreted with the leftmost as most significant, that the runtime value of an integer literal is not an lvalue, that it counts as a compile-time constant value, and so on). It attempts to make no assumptions at all (and there have been cases where C++ compiler vendors _have_ made deliberately obtuse interpretations just to make a point about the standard).

That's exactly why reference documentation is more useful than a specification: because it leaves out the things that should be obvious to anyone capable of programming Python. To learn how integer literals work in Python, I need to look at two short and accessible paragraphs; to learn how integer literals work in C++, I have to read 2 full-page sections plus parts of at least 2 others, all written in impenetrable legalese.