[Python-Dev] more unicode: \U support?

M.-A. Lemburg mal@lemburg.com
Fri, 28 Jul 2000 12:16:09 +0200


Tim Peters wrote:
> 
> [/F]
> > would it be a good idea to add \UXXXXXXXX
> > (8 hex digits) to 2.0?
> >
> > (only characters in the 0000-ffff range would
> >  be accepted in the current version, of course).
> 
> [Tim agreed two msgs later; Guido agreed in private]
> 
> [MAL]
> > I don't really get the point of adding \uXXXXXXXX
> 
> No:  Fredrik's suggestion is with an uppercase U.  He is not proposing to
> extend the (lowercase) \u1234 notation.

Ah, ok. So there will be no incompatibility with Java et al.
 
> > when the internal format used is UTF-16 with support for surrogates.
> >
> > What should \u12341234 map to in a future implementation ?
> > Two Python (UTF-16) Unicode characters ?
> 
> \U12345678 is C99's ISO 10646 notation; as such, it can't always be mapped
> to UTF-16.

Sure it can: you'd have to use surrogates. Whether this should
happen is another question, but not one which we'll have to deal
with now, since as Fredrik proposed, \UXXXXXXXX will only
work for 0-FFFF and raise an exception for all higher values.

> > See
> >
> > http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc
> .html#100850
> >
> > for how Java defines \uXXXX...
> 
> Which I pushed for from the start, and nobody is seeking to change.
> 
> > We're following an industry standard here ;-)
> 
> \U12345678 is also an industry standard, but in a more recent language (than
> Java) that had more time to consider the eventual implications of Unicode's
> limitations.  We reserve the notation now so that it's possible to outgrow
> Unicode later.

Ok.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/