[Python-ideas] Re: Custom string prefixes

26 Aug 2019

      ...
On Aug 26, 2019, at 18:41, stpasha@gmail.com wrote:
Thanks, Andrew, for your feedback. I didn't even think about string **suffixes**, but
clearly they can be implemented together with the prefixes for additional flexibility.
What about _instead of_ rather than _together with_? Half of Stephen’s objections are related to the ambiguity (to a human, even if not to the parser) of user prefixes in the (potential) presence of the builtin prefixes. None of those go even arise with suffixes. Anyway, maybe you already have good answers for all of those objections, but if not…

Also, there’s at least one mainstream language (C++) that allows user suffixes and has literal syntax otherwise somewhat like Python’s, and the proposals for other languages like Rust generally seem to be generally trying to do “like C++ but minus all the usual C++ over-complexity”. Are there actual examples of languages with user prefixes?

The only different designs I know of rely on the static type of the evaluation context. (For example, in Swift, you can just statically type `23 : km` or `"abc]*" : regex`, or even just pass the literal to a function that’s declared or inferred to take a regex if that happens to be readable in your use case, so there’s no need for a suffix syntax.) Which is neat, but obviously not applicable to Python.
...
And your idea that `<string literal> <suffix>` is conceptually no different than
`<numeric literal> <suffix>` is absolutely insightful.
Well, back in 2015 I probably just stole the idea from C++. :)

Another question that raises that I just remembered: the word “literal” has three overlapping but distinct meanings in Python. Which one do we actually mean here? In particular, are container displays “literals”? For that matter, is -2 even a literal?

Also, from what I remember, either in 2013 or in 2015, the discussion got side-tracked over people not liking the word “literal” to mean “something that’s actually the result of a runtime function call”. That may be less of a problem after f-strings (which are called literals in the PEP; not sure about the language reference), but last time around, bringing up the fact that “-2” is actually a function call didn’t sway anyone. So, maybe I shouldn’t be using the word “literal” this time, and I really hope it doesn’t ruin your proposal…
...
Speaking of string suffixes, flags on regular expressions immediately come to mind.
For example `rx"(abc)"ig` could create a regular expression that performs global 
case-insensitive search.
That’s an interesting idea. And that’s something you can’t do with a single-affix design; you need prefixes and suffixes, unless you have some kind of separator for chaining, or only allow single characters.
...
...
I don’t think you can fairly discuss this idea without getting at least a
_little_ bit into the implementation details.
Right. So, the first question to answer is what the compiler should do when it sees
a prefixed (suffixed) string? That is, what byte-code should be emitted when the
compiler sees `lambda: a"bcd"e` ?
In one approach, we'd want this expression to be evaluated at compile time, similar
to how f-strings work. However, how would the compiler know what prefix "a" means
exactly? There has to be some kind of directive to tell the compiler that. For example,
imagine the compiler sees near the top of the file
#pragma from mymodule import a
It would then import the symbol `a`, call `a("bcd", suffix="e")`. This would return an
AST tree that will be plugged in place of the original string.
This solution allows maximum efficiency, but seems inflexible and deeply invasive.
Another approach would defer the construction of objects to compile time. Though
not as efficient, it would allow loading prefixes at run-time. In this case `a"bcd"e` can
be interpreted by the compiler as if it was
a("bcd", suffix="e")
where symbol `a` is to be looked up in the local/global scope.
My hack works basically like this. The compiler just converts it to a function call, which is looked up normally. I think that’s the right tack here. IIRC, my hack translates a D suffix into a call to something like -_user_literal_D, which solves the problem with accidental pollution of the namespace. But this does mean that any code that wants to use the D suffix has to `from decimal_literals import *, or `2.3D` raises a NameError about nothing named _user_literal_D. (Either that, or someone has to inject it into builtins…) I’m not sure whether that’s user-friendly enough.

Anyway, I think your registry idea makes more sense. Then `2.3D` effectively just means `__user_literals__['D']('2.3')`, and there’s no namespace pollution at all.
...
For this approach to work, we'd create a
new code op, so that `a"bcd"e` would become
0 LOAD_CONST        1 ('a', 'bcd', 'e')
   2 STR_RESOLVE_TAG   0
where `STR_RESOLVE_TAG` would effectively call `__resolve_tag__()` special
method. The method would search for `a` in the registry of known string tags,
and then pass the tuple to the corresponding constructor.
Do we even need that? It’s true that most things in Python translate reasonably directly to bytecodes, but in this case it might be easier to just compile to existing bytecodes to look up and call the function.
...
There will, of course, be a method to register new tags. Something like
str.___register_tag__('a', MyAObject)
If the params are (handler, name=None), and None means to use the __name__ of the handler at the tag, then you can use it as a decorator:

    @__register_tag__
    def D(decimal_string):
        return decimal.Decimal(decimal_string)

Although this may not be the best example, because it might actually be clearer (as well as more efficient) to just register the constructor:

    __register_tag__(decimal.Decimal, 'D')

… but I suspect many examples won’t be just a matter of calling a constructor on the string.
...
As for suffix-only literals, we can treat them as if they begin with an underscore.
Thus, `1/3f` would be equivalent to
1/_f(3)
Does that mean you can’t actually register a prefix named `_f`? Or that, if you do, it also registers a suffix named `f`?

Also, I think for most non-single-letter suffixes you’d actually want an underscore at the start of the suffix. See C++ for lots of examples, but for a quick illustration,compare these:

    c = 2.99792458e8mps
    c = 2.99792458e8_mps

    c = 299_792_458mps
    c = 299_792_458_mps

The _mps suffix looks a lot better than the mps suffix, doesn’t it? But would you want the function to have to be named __mps with two underscores?

It may be worth coming up with the most compelling examples and then working out what feature set would support as many as possible, rather that trying to work out the ultimate feature set first and then see what we can do with it. It’s probably worth stealing liberally from the C++ discussion (and any other languages that have similar features) as well as the 2013 Python discussion, but off the top of my head:

 * Decimal, Fraction, np.float32, mpz, …
 * Path objects
 * Windows native Path objects, possibly with “really raw” processing to allow trailing backslashes
 * regex, possibly with flags, possibly with “really raw” backslashes
 * “Really raw” strings in general.
 * JSON (register the stdlib or simplejson or ujson), XML (register ETree or lxml or bs4 or whatever you want), HTML, etc.
 * unit suffixes for quantities

[Python-ideas] Re: Custom string prefixes

Andrew Barnert