[Python-ideas] Support Unicode code point notation
Stephen J. Turnbull
stephen at xemacs.org
Fri Aug 2 10:32:45 CEST 2013
Alexander Belopolsky writes:
> On Thu, Aug 1, 2013 at 11:30 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>> -1. The obvious way forward is \N{U+1FFFF}. That *looks* like an
>> algorithmically generated name, and (wow!) that's what it *is*.
> The only problem is that this is not a conforming name according to
> the Unicode standard. The standard is very explicit in its
> recommendation on how the names should be generated: "Use in
> APIs. APIs which return the value of a Unicode “character name” [...]
This whole section of the standard is irrelevant. Of course
unicodedata.name('A') should *return* 'LATIN CAPITAL LETTER A', but
we're discussing the possibility of extending what
unicodedata.lookup() should *accept*.
> The recommendation on what should be accepted as a valid name is
> more relaxed: "... it can be more effective for a user interface to
> use names that were translated or otherwise adjusted to meet the
> expectations of the targeted user community."
It seems to me that's exactly what those of us who advocate using \N{}
are saying.
> This does not literally preclude treating U+NNNN as a character
> name, but it looks like such use is discouraged: "A constructed
> code point label is distinguished from the designation of the code
> point itself (for example, “U+0009” or “U+FFFF”), which is also a
> unique identifier."
I don't see any such implication. What's being said here is that an
application should not expect a conforming implementation to treat
"U+0009" and "control-0009" identically in all respects. For example,
"control-0009" might be subjected to the kind of consistency check you
want. Or only one of the two might be acceptable to a name lookup
function. Or you might have to use different functions to convert
them to characters.
Steve
More information about the Python-ideas
mailing list