[I18n-sig] Re: How does Python Unicode treat surrogates?

M.-A. Lemburg mal@lemburg.com
Mon, 25 Jun 2001 22:31:33 +0200


Mark Davis wrote:
> 
> > My question was targetting into a slightly different direction,
> > though. I know that UTF-16 does not allow lone surrogates, but
> > how does Unicode itself treat these ? If I have a sequence of Unicode
> > code points which includes an isolated surrogate code point,
> > would this be considered a legal Unicode sequence or not ?
> 
> It is a legal Unicode code point sequence. However, it is not a legal
> Unicode *character* sequence, since it contains code points that by
> definition cannot be used to represent characters.

So its basically a matter of viewing a string as sequence
of characters vs. sequence of code points.

Thanks for the explanation,
-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/