[I18n-sig] How does Python Unicode treat surrogates?

Tom Emerson tree@basistech.com
Mon, 25 Jun 2001 15:17:57 -0400

Fredrik Lundh writes:
> I'm sceptical -- I see very little reason to maintain that distinction.
> let's use either UCS-2 or UCS-4 for the internal storage, stick to the
> "character strings are character sequences" concept, and keep the
> UTF-16 surrogate issue where it belongs: in the codecs.

How then is u"\U00200000" represented internally if you use UCS-2 as
the internal storage representation?

Tom Emerson                                          Basis Technology Corp.
Sr. Sinostringologist                              http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"