[I18n-sig] How does Python Unicode treat surrogates?
Tom Emerson
tree@basistech.com
Tue, 26 Jun 2001 08:17:07 -0400
Fredrik Lundh writes:
> it is not directly supported in Python 2.0, 2.1, and the
> current 2.2 codebase. no amount of arguing or wishful
> thinking will change that.
It is supported insofar as I can write
u"\U0020000"
and get the UTF-16 encoded u"\ud840\udc00" back. If you limit the
internal representation to UCS-2 then you constrain yourself only to
Plane 0 and the surrogate pairs are undefined. Hence you would have to
disallow the above notation.
-tree
--
Tom Emerson Basis Technology Corp.
Sr. Sinostringologist http://www.basistech.com
"Beware the lollipop of mediocrity: lick it once and you suck forever"