[Python-ideas] Support Unicode code point notation
Stephen J. Turnbull
stephen at xemacs.org
Fri Aug 2 03:15:42 CEST 2013
Alexander Belopolsky writes:
> On Thu, Aug 1, 2013 at 8:04 PM, Bruce Leban <bruce at leapyear.org> wrote:
>> I'm +/-0 on having 'control-' and 'reserved-' etc. simply being
>> different spellings of 'U+' so that '\N{control-0021}' == '\N{U+0021}'
>> == '\x21' == '!' even though that isn't a control character.
> This misses the point of adding the code point type prefix.
Not really. That would just pass the responsibility for enforcing
consistency to linters, instead of the translator. You can't just
make this a syntax error because a code point may be reserved one
Python version and a letter in another, depending on which versions of
the Unicode tables are being used by those versions of Python. That
would conflict with Unicode itself, which says that unknown code
points must be treated as characters. This is way too fragile to be
allowed to cause syntax errors.
> If you fat-finger \N{control-0021} instead of intended \N{control-0012}
> you would want a quick syntax error rather than an obscure bug.
> Similarly, when you are reading someone else's code, you don't want
> to consult the code table every time you see \N{control-NNNN} to
> assure that this is really a control character rather than a
> surrogate- or private-use- one.
+0 on Bruce's idea, -1 on syntax errors
It might be on rare occasions be useful to be strict about fixed-for-
all-time types like surrogate and private use. (But even those
weren't fixed for all time in the past!) Really, this is an editor or
linter function.
More information about the Python-ideas
mailing list