[Python-Dev] more unicode: \U support?

M.-A. Lemburg mal@lemburg.com
Thu, 27 Jul 2000 22:29:59 +0200


Tim Peters wrote:
> 
> [/F]
> > would it be a good idea to add \UXXXXXXXX
> > (8 hex digits) to 2.0?
> >
> > (only characters in the 0000-ffff range would
> > be accepted in the current version, of course).

I don't really get the point of adding \uXXXXXXXX when the
internal format used is UTF-16 with support for surrogates.

What should \u12341234 map to in a future implementation ?
Two Python (UTF-16) Unicode characters ?
 
> [Tim]
> > In which case there seems darned little point to it now <wink/frown>.
> 
> [/F]
> > with Python's approach to escape codes, it's not exactly easy
> > to *add* a new escape code -- you risk breaking code that for
> > some reason (intentional or not) relies on u"\U12345678" to end
> > up as a backslash followed by 9 characters...
> >
> > not very likely, but I've seen stranger things...
> 
> Ah!  You're right, I'm wrong.  +1 on \U12345678 now.

See

http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#100850

for how Java defines \uXXXX... 

We're following an industry standard here ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/