[Python-ideas] Support Unicode code point notation
Alexander Belopolsky
alexander.belopolsky at gmail.com
Fri Aug 2 01:55:47 CEST 2013
On Thu, Aug 1, 2013 at 7:20 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> I'd never even heard of code point labels before this thread, while the
> "U+" notation is incredibly common.
Nick,
Did you see this part: "A constructed code point label is distinguished
from the designation of the code point itself (for example, “U+0009” or
“U+FFFF”), which is also a unique identifier"?
The purpose of unicode.lookup() is to look up the unicode code point by
name and "U+NNNN" is not a name - it is "the designation of the code point
itself." There is no need to look up anything if you want to process an
occasional s = "U+FFFF" string: chr(int(s[2:], 16) ) will do the job.
The original proposal was to allow \U+NNNN escape as a shortcut for
\U0000NNNN. This is a clear readability improvement while \N{U+001B}, for
example, is not an improvement over \N{ESCAPE}. However, for more obscure
control characters, \N{control-NNNN} may be clearer than any currently
available spelling. For example, \N{control-001E} is easier to understand
than \036, \x1e, \u001E, \N{RS} or even the most verbose \N{INFORMATION
SEPARATOR TWO}.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130801/532fd4f2/attachment.html>
More information about the Python-ideas
mailing list