[Python-ideas] Support Unicode code point notation

Alexander Belopolsky alexander.belopolsky at gmail.com
Fri Aug 2 04:14:31 CEST 2013


On Thu, Aug 1, 2013 at 9:15 PM, Stephen J. Turnbull <stephen at xemacs.org>
wrote:
>
> Alexander Belopolsky writes:
>  > On Thu, Aug 1, 2013 at 8:04 PM, Bruce Leban <bruce at leapyear.org> wrote:
> ..
>  > This misses the point of adding the code point type prefix.
>
> Not really.  That would just pass the responsibility for enforcing
> consistency to linters, instead of the translator.


I have not seen a linter yet that would suggest that "\x41" should be
written as "A".  The choice of the best literal syntax requires human
judgement.   A linter cannot tell you when 1.00 is better than 1.0 or 1.  I
would choose a more verbose \N{control-NNNN} over shorter \uNNNN when I
want to make it obvious to the human reader of my code that I use a control
character  rather than anything else.


>
>  You can't just
> make this a syntax error because a code point may be reserved one
> Python version and a letter in another, depending on which versions of
> the Unicode tables are being used by those versions of Python.


That's true, but why would you write \N{reserved-NNNN} instead of \uNNNN to
begin with?  I would assume you would only choose a longer spelling when it
is important for your program that you use a reserved character and your
program will not work correctly with the UCD version where the NNNN code
point is assigned.

>
>  That
> would conflict with Unicode itself, which says that unknown code
> points must be treated as characters.  This is way too fragile to be
> allowed to cause syntax errors.


You can always avoid syntax errors by using \uNNNN.  If you choose to
specify the character type you hopefully do it for a good reason.
>
> ..
>
> It might be on rare occasions be useful to be strict about fixed-for-
> all-time types like surrogate and private use.

There are only five type prefixes: control-, reserved-, non-character-,
private-use-, and surrogate-.  With the possible exception or reserved-, on
a rare occasion when you want to be explicit about the character type, it
is useful to be strict.  In case of reserved-, I cannot think of any
legitimate use for a reserved character in a string literal, so if
strictness is a problem in this case, I would disallow \N{reserved-NNNN}
altogether.

>  (But even those weren't fixed for all time in the past!)


Now they are: control- property is immutable since version 1.1.5,
surrogate- and private-use- since 2.0, and noncharacter- since 3.1.0.  (See
<http://www.unicode.org/policies/stability_policy.html>.)  Moreover, since
2.1.0, "The enumeration of General_Category property values is fixed. No
new values will be added."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130801/547de4c5/attachment.html>


More information about the Python-ideas mailing list