[Python-ideas] Support Unicode code point notation

Fri Aug 2 01:55:47 CEST 2013

On Thu, Aug 1, 2013 at 7:20 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> I'd never even heard of code point labels before this thread, while the
> "U+" notation is incredibly common.

Nick,

Did you see this part: "A constructed code point label is distinguished
from the designation of the code point itself (for example, “U+0009” or
“U+FFFF”), which is also a unique identifier"?

The purpose of unicode.lookup() is to look up the unicode code point by
name and "U+NNNN" is not a name - it is "the designation of the code point
itself."  There is no need to look up anything if you want to process an
occasional s = "U+FFFF" string: chr(int(s[2:], 16) ) will do the job.

The original proposal was to allow \U+NNNN escape as a shortcut for
\U0000NNNN.  This is a clear readability improvement while \N{U+001B}, for
example,  is not an improvement over \N{ESCAPE}.  However, for more obscure
control characters, \N{control-NNNN} may be clearer than any currently
available spelling.  For example, \N{control-001E} is easier to understand
than \036, \x1e, \u001E, \N{RS} or even the most verbose \N{INFORMATION
SEPARATOR TWO}.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130801/532fd4f2/attachment.html>