[I18n-sig] How does Python Unicode treat surrogates?

Paul Prescod paulp@ActiveState.com
Mon, 25 Jun 2001 13:03:54 -0700

Fredrik Lundh wrote:
> I'm sceptical -- I see very little reason to maintain that distinction.
> let's use either UCS-2 or UCS-4 for the internal storage, stick to the
> "character strings are character sequences" concept, and keep the
> UTF-16 surrogate issue where it belongs: in the codecs.

I agree. But I'd add that if different people really need different
performance/simplicity trade-offs then maybe we need multiple variants
of the Unicode object. But please don't cut those of us who value
simplicity off from the option of strings that work entirely in terms of
logical characters (code points) and not physical representation units.

Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook