On Thu, Aug 1, 2013 at 4:55 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Thu, Aug 1, 2013 at 7:20 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I'd never even heard of code point labels before this thread, while the "U+" notation is incredibly common.

<snip>

The original proposal was to allow \U+NNNN escape as a shortcut for \U0000NNNN.  This is a clear readability improvement while \N{U+001B}, for example,  is not an improvement over \N{ESCAPE}.  However, for more obscure control characters, \N{control-NNNN} may be clearer than any currently available spelling.  For example, \N{control-001E} is easier to understand than \036, \x1e, \u001E, \N{RS} or even the most verbose \N{INFORMATION SEPARATOR TWO}.

My reason to suggest including it is that it's in the standard as the label for these characters so it's reasonable to expect lookup to know about these labels just as it knows about 'EXCLAMATION MARK'. If someone has created data using the standard and passes it to unicode.lookup, it should work. I'm +/-0 on having 'control-' and 'reserved-' etc. simply being different spellings of 'U+' so that '\N{control-0021}' == '\N{U+0021}' == '\x21' == '!' even though that isn't a control character. That is, if the data doesn't conform to the standard, it wouldn't necessarily be terrible if it did something reasonable rather than raising an exception.

And, I'm only suggesting this be supported on the reading side.

--- Bruce
I'm hiring: http://www.cadencemd.com/info/jobs
Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
Learn how hackers think: http://j.mp/gruyere-security