[Python-Dev] more unicode: \U support?
M.-A. Lemburg
mal@lemburg.com
Fri, 28 Jul 2000 12:16:09 +0200
Tim Peters wrote:
>
> [/F]
> > would it be a good idea to add \UXXXXXXXX
> > (8 hex digits) to 2.0?
> >
> > (only characters in the 0000-ffff range would
> > be accepted in the current version, of course).
>
> [Tim agreed two msgs later; Guido agreed in private]
>
> [MAL]
> > I don't really get the point of adding \uXXXXXXXX
>
> No: Fredrik's suggestion is with an uppercase U. He is not proposing to
> extend the (lowercase) \u1234 notation.
Ah, ok. So there will be no incompatibility with Java et al.
> > when the internal format used is UTF-16 with support for surrogates.
> >
> > What should \u12341234 map to in a future implementation ?
> > Two Python (UTF-16) Unicode characters ?
>
> \U12345678 is C99's ISO 10646 notation; as such, it can't always be mapped
> to UTF-16.
Sure it can: you'd have to use surrogates. Whether this should
happen is another question, but not one which we'll have to deal
with now, since as Fredrik proposed, \UXXXXXXXX will only
work for 0-FFFF and raise an exception for all higher values.
> > See
> >
> > http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc
> .html#100850
> >
> > for how Java defines \uXXXX...
>
> Which I pushed for from the start, and nobody is seeking to change.
>
> > We're following an industry standard here ;-)
>
> \U12345678 is also an industry standard, but in a more recent language (than
> Java) that had more time to consider the eventual implications of Unicode's
> limitations. We reserve the notation now so that it's possible to outgrow
> Unicode later.
Ok.
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/