On Aug 27, 2019, at 18:19, Chris Angelico
wrote: On Wed, Aug 28, 2019 at 10:52 AM Andrew Barnert
wrote: On Aug 27, 2019, at 14:41, Chris Angelico
wrote: All the examples about Windows paths fall into one of two problematic boxes: 1) Proposals that allow an arbitrary prefix to redefine the entire parser - basically impossible for anything sane
2) Proposals that do not allow the prefix to redefine the parser, and are utterly useless, because the rest of the string still has to be valid.
3) Proposals that do not allow the prefix to redefine the parser for the entire program, but do allow it to manually parse anything the tokenizer can recognize as a single (literal) token.
As I said, I haven’t tried to implement this example as I have with the other examples, so I can’t promise that it’s doable (with the current tokenizer, or with a reasonable change to it). But if it is doable, it’s neither insane nor useless. (And evenif it’s not doable, that’s just two examples that affixes can’t solve—Windows paths and general “super-raw strings”. They still solve all of the other examples.)
So what is the definition of "a single literal token" when you're creating a path-string? You want this to be valid:
x = path"C:\"
For this to work, the path prefix has to redefine the way the parser finds the end of the token, does it not?
I’m not sure (maybe about 60% at best), but I think last time I checked this, the tokenizer actually hits the error without munching the rest of the file. If I’m wrong, then you would need to add a “really raw string literal” builtin that any affixes that want really raw string literals could use, but that’s all you’d have to do. And I really don’t think it’s worth getting this in-depth into just one of the possible uses that I just tossed off as an aside, especially without actually sitting down and testing anything.
Look at the plethora of suffixes C has for number and character literals. Look at how many things people still can’t do with them that they want to.
I don't know how many there are. The only ones I can think of are "f" for single-precision float, and the long and unsigned suffixes on integers.
Of the top of my head, there are also long long integers, and long doubles, and wide and three Unicode suffixes for char. Those probably aren’t all of them. And your compiler probably has extensions for “legacy” suffixes and nonstandard types like int128 or decimal64 and so on.
Python doesn't have these because very few programs need to care about whether a float is single-precision or double-precision, or how large an int is.
Right, but the issue isn’t which ones, but how many. C doesn’t have decimals or fractions, and other things like datetime objects have been suggested in this thread, and even more in the two earlier threads. If there are too many useful kinds of constants, there are too many to make them all builtins.
Do you think Python users are incapable of the kind of restraint and taste shown by C++ users, and therefore we can’t trust Python users with a tool that might possibly (but we aren’t sure) if abused badly enough make code harder to visually parse?
People can be trusted with powerful features that can introduce complexity. There's just not a lot of point introducing a low-value feature that adds a lot of complexity.
But it really doesn’t add a lot of complexity. If you’re not convinced that really-raw string processing is doable, drop that. Since the OP hasn’t given a detailed version of his grammar, just take mine: a literal token immediately followed by one or more identifier characters (that couldn’t have been munched by the literal) is a user-suffix literal. This is compiled into code that looks up the suffix in a central registry and calls it with the token’s text. That’s all there is to it. Compare that adding Decimal (and Fraction, as you said last time) literals when the types aren’t even builtin. That’s more complexity, for less benefit. So why is it better?