[Python-Dev] more unicode: \U support?
Fri, 28 Jul 2000 12:16:09 +0200
Tim Peters wrote:
> > would it be a good idea to add \UXXXXXXXX
> > (8 hex digits) to 2.0?
> > (only characters in the 0000-ffff range would
> > be accepted in the current version, of course).
> [Tim agreed two msgs later; Guido agreed in private]
> > I don't really get the point of adding \uXXXXXXXX
> No: Fredrik's suggestion is with an uppercase U. He is not proposing to
> extend the (lowercase) \u1234 notation.
Ah, ok. So there will be no incompatibility with Java et al.
> > when the internal format used is UTF-16 with support for surrogates.
> > What should \u12341234 map to in a future implementation ?
> > Two Python (UTF-16) Unicode characters ?
> \U12345678 is C99's ISO 10646 notation; as such, it can't always be mapped
> to UTF-16.
Sure it can: you'd have to use surrogates. Whether this should
happen is another question, but not one which we'll have to deal
with now, since as Fredrik proposed, \UXXXXXXXX will only
work for 0-FFFF and raise an exception for all higher values.
> > See
> > http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc
> > for how Java defines \uXXXX...
> Which I pushed for from the start, and nobody is seeking to change.
> > We're following an industry standard here ;-)
> \U12345678 is also an industry standard, but in a more recent language (than
> Java) that had more time to consider the eventual implications of Unicode's
> limitations. We reserve the notation now so that it's possible to outgrow
> Unicode later.
Python Pages: http://www.lemburg.com/python/