[I18n-sig] How does Python Unicode treat surrogates?

M.-A. Lemburg mal@lemburg.com
Sat, 23 Jun 2001 12:38:39 +0200


Could someone please restate the original question ? The archives
don't seem to have the original postings and the quotes Martin
have in his reply don't seem to have anything todo with Python.

About surrogate support in Python: the UTF-8 codec has full
surrogate support for encodings and decoding, the unicode-escape
codec can decode using surrogates, all others don't support
surrogates.

Thanks,
-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/

"Martin v. Loewis" wrote:
> 
> [Uche]
> > Sure.  I admit it's hearsay, but I thought I'd read that because Java
> > Unicode is or was underspecified, that there was the possibility of
> > transposition of the high-surrogate with the low-surrogate character
> > between Java implementations or platforms.
> 
> I've tried to find out what problem that could be. So far, I found
> 
> http://developer.java.sun.com/developer/bugParade/bugs/4344266.html
> 
> Here, they complain that the codecs don't properly check for
> surrogates that straddle invocations of convert, or get incorrect
> surrogate pairs. There is a bug report on SF that Python has similar
> problems.
> 
> http://developer.java.sun.com/developer/bugParade/bugs/4328816.html
> 
> summarizes problems that have been fixed with surrogates in UTF-8,
> again, similar problems are probably present in Python.
> 
> There were also a few bug reports about surrogates working differently
> depending on locale (fail in zh_CN, pass in C), and type of virtual
> machine (fail in classic, pass in hotspot).
> 
> I could not find any report on a bug where surrogates are output in
> incorrect order.
> 
> [Guido]
> > On the XML sig the following exchange happened.  I don't know enough
> > about the issues to investigate, but I'm sure that someone here can
> > provide insight?  It seems to boil down to whether or not surrogates
> > may get transposed when between platforms.
> 
> I very much doubt this could ever happen.
> 
> Regards,
> Martin
> 
> _______________________________________________
> I18n-sig mailing list
> I18n-sig@python.org
> http://mail.python.org/mailman/listinfo/i18n-sig