[I18n-sig] How does Python Unicode treat surrogates?

Tom Emerson tree@basistech.com
Tue, 26 Jun 2001 08:17:07 -0400


Fredrik Lundh writes:
> it is not directly supported in Python 2.0, 2.1, and the
> current 2.2 codebase.  no amount of arguing or wishful
> thinking will change that.

It is supported insofar as I can write

u"\U0020000"

and get the UTF-16 encoded u"\ud840\udc00" back. If you limit the
internal representation to UCS-2 then you constrain yourself only to
Plane 0 and the surrogate pairs are undefined. Hence you would have to
disallow the above notation.

    -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Sr. Sinostringologist                              http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"