Nick Coghlan writes:
It doesn't bother me that much personally, especially if it was a general comma delimited capability that also worked with other code point names,
I think it should bother you, though. It's not a problem for Python core developers, it's true. Similarly, ISO 2022 was a great idea in theory, and works fine for communication of text over streams. The problem is when you want to embed that stream in some higher-level protocol. So, for example, the original space-separated syntax breaks one-argument split-string, while your comma-separated version breaks CSV. You could fix both of those by using no separator and simply finishing the current code point on encountering "U+" or "}", but I doubt anybody would find that variant appealing. Now, for program literals this isn't going to matter because a string will be converted to internal representation by the compiler, and the program never sees that syntax. But what about applications like web frameworks which often eval client-supplied strings? I hope we are not going to recommend they eval them before validating them!<wink/>
but my inclination is to call YAGNI on the additional complexity.
"Using 'complexity' to refer to this syntax isn't really valid though - what it is, is 'complicated'."<wink/>
Using "modal encoding" to refer to that change isn't really valid though
No, it's quite correct, at least in ISO-land. There, a modal encoding is one which must maintain state across *code points*. The single- code-point "\N" syntax needs to maintain state across *code units*, but when it's done with a code *point*, it's done - there's no state to worry about before starting to parse the next one. By your definition, UTF-8 is modal, but that doesn't seem a very useful categorization to me.