[Python-ideas] User-defined literals

Tue Jun 2 21:03:25 CEST 2015

This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++.

In the thread on decimals, a number of people suggested that they'd like to have decimal literals. Nick Coghlan explained why decimal.Decimal literals don't make sense in general (primarily, but not solely, because they're inherently context-sensitive), so unless we first add a fixed type like decimal64, that idea is a non-starter. However, there was some interest in either having Swift-style convertible literals or C++-style user-defined literals. Either one would allow users who want decimal literals for a particular app where it makes sense (because there's a single fixed context, and the performance cost of Decimal('1.2') vs. a real constant is irrelevant) to add them without too much hassle or hackery.

I explored the convertible literals a while ago, and I'm pretty sure that doesn't work in a duck-typed language. But the C++ design does work, as long as you're willing to have the conversion (including the lookup of the conversion function itself) done at runtime.

Any number or string token followed by a name (identifier) token is currently illegal. This would change so that, if there's no whitespace between them, it's legal, and equivalent to a call to a function named `literal_{name}({number-or-string})`. For example, `1.2d` becomes `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also becomes `literal_d('1.2')`.

Of course `0x12decimal` becomes `literal_imal('0x12dec')`, and `21jump` becomes `literal_ump('21j'), which are not at all useful, and potentially confusing, but I don't think that would be a serious problem in practice.

Unlike C++, the lookup of that literal function happens at runtime, so `1.2z3` is no longer a SyntaxError, but a NameError on `literal_z3`. Also, this means `literal_d` has to be in scope in every module you want decimal literals in, which often means a `from … import` (or something worse, like monkeypatching builtins). C++ doesn't have that problem because of argument-dependent lookup, but that doesn't work for any other language. I think this is the biggest flaw in the proposal.

Also unlike C++, there's no overloading on different kinds of literals; the conversion function has no way of knowing whether the user actually typed a string or a number. This could easily be changed (e.g., by using different names, or just by passing the repr of the string instead of the string itself), but I don't think it's necessary.

Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions.

I've built a quick&dirty toy implementation (at https://github.com/abarnert/userliteralhack). Unlike the real proposal, this only handles numbers, and allows whitespace between the numbers and the names, and is a terrible hack. But it's enough to play with the idea, and you don't need to patch and recompile CPython to use it.

My feeling is that this would be useful, but the problems are not surmountable without much bigger changes, and there's no obvious better design that avoids them. But I'm interested to see what others think.