[Python-ideas] Support Unicode code point notation

Stephen J. Turnbull stephen at xemacs.org
Fri Aug 2 10:32:45 CEST 2013


Alexander Belopolsky writes:
 > On Thu, Aug 1, 2013 at 11:30 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:

 >> -1.  The obvious way forward is \N{U+1FFFF}.  That *looks* like an
 >> algorithmically generated name, and (wow!) that's what it *is*.

 > The only problem is that this is not a conforming name according to
 > the Unicode standard. The standard is very explicit in its
 > recommendation on how the names should be generated: "Use in
 > APIs. APIs which return the value of a Unicode “character name” [...]

This whole section of the standard is irrelevant.  Of course
unicodedata.name('A') should *return* 'LATIN CAPITAL LETTER A', but
we're discussing the possibility of extending what
unicodedata.lookup() should *accept*.

 > The recommendation on what should be accepted as a valid name is
 > more relaxed: "... it can be more effective for a user interface to
 > use names that were translated or otherwise adjusted to meet the
 > expectations of the targeted user community."

It seems to me that's exactly what those of us who advocate using \N{}
are saying.

 > This does not literally preclude treating U+NNNN as a character
 > name, but it looks like such use is discouraged: "A constructed
 > code point label is distinguished from the designation of the code
 > point itself (for example, “U+0009” or “U+FFFF”), which is also a
 > unique identifier."

I don't see any such implication.  What's being said here is that an
application should not expect a conforming implementation to treat
"U+0009" and "control-0009" identically in all respects.  For example,
"control-0009" might be subjected to the kind of consistency check you
want.  Or only one of the two might be acceptable to a name lookup
function.  Or you might have to use different functions to convert
them to characters.

Steve


More information about the Python-ideas mailing list