On Thu, Aug 1, 2013 at 9:15 PM, Stephen J. Turnbull
Alexander Belopolsky writes:
On Thu, Aug 1, 2013 at 8:04 PM, Bruce Leban
wrote: .. This misses the point of adding the code point type prefix. Not really. That would just pass the responsibility for enforcing consistency to linters, instead of the translator.
I have not seen a linter yet that would suggest that "\x41" should be written as "A". The choice of the best literal syntax requires human judgement. A linter cannot tell you when 1.00 is better than 1.0 or 1. I would choose a more verbose \N{control-NNNN} over shorter \uNNNN when I want to make it obvious to the human reader of my code that I use a control character rather than anything else.
You can't just make this a syntax error because a code point may be reserved one Python version and a letter in another, depending on which versions of the Unicode tables are being used by those versions of Python.
That's true, but why would you write \N{reserved-NNNN} instead of \uNNNN to begin with? I would assume you would only choose a longer spelling when it is important for your program that you use a reserved character and your program will not work correctly with the UCD version where the NNNN code point is assigned.
That would conflict with Unicode itself, which says that unknown code points must be treated as characters. This is way too fragile to be allowed to cause syntax errors.
You can always avoid syntax errors by using \uNNNN. If you choose to specify the character type you hopefully do it for a good reason.
..
It might be on rare occasions be useful to be strict about fixed-for- all-time types like surrogate and private use.
There are only five type prefixes: control-, reserved-, non-character-, private-use-, and surrogate-. With the possible exception or reserved-, on a rare occasion when you want to be explicit about the character type, it is useful to be strict. In case of reserved-, I cannot think of any legitimate use for a reserved character in a string literal, so if strictness is a problem in this case, I would disallow \N{reserved-NNNN} altogether.
(But even those weren't fixed for all time in the past!)
Now they are: control- property is immutable since version 1.1.5, surrogate- and private-use- since 2.0, and noncharacter- since 3.1.0. (See http://www.unicode.org/policies/stability_policy.html.) Moreover, since 2.1.0, "The enumeration of General_Category property values is fixed. No new values will be added."