[I18n-sig] How does Python Unicode treat surrogates?

Fredrik Lundh fredrik@pythonware.com
Tue, 26 Jun 2001 08:50:07 +0200


Tom Emerson wrote:
> > How then is u"\U00200000" represented internally if you use UCS-2 as
> > the internal storage representation?
> >
> > I think the obvious answer is: It is not supported. It will give an
> > exception when you try to convert an UTF-8 or UTF-16 string that has
> > such a character, it will be an error if you pass a surrogate to
> > unichr, or in a \u literal.
> 
> So the characters added in Unicode 3.1 in planes 1, 2, and 14 would
> not be representable in Python? Seems a bit draconian to make your
> life easier.

it is not directly supported in Python 2.0, 2.1, and the
current 2.2 codebase.  no amount of arguing or wishful
thinking will change that.

</F>