[I18n-sig] Re: [Python-Dev] Unicode debate

Fredrik Lundh Fredrik Lundh" <effbot@telia.com
Tue, 2 May 2000 11:00:07 +0200

M.-A. Lemburg <mal@lemburg.com> wrote:
> Just a small note on the subject of a character being atomic
> which seems to have been forgotten by the discussing parties:
> Unicode itself can be understood as multi-word character
> encoding, just like UTF-8. The reason is that Unicode entities
> can be combined to produce single display characters (e.g.
> u"e"+u"\u0301" will print "=E9" in a Unicode aware renderer).
> Slicing such a combined Unicode string will have the same
> effect as slicing UTF-8 data.

really?  does it result in a decoder error?  or does it just result
in a rendering error, just as if you slice off any trailing character
without looking...

> It seems that most Latin-1 proponents seem to have single
> display characters in mind. While the same is true for
> many Unicode entities, there are quite a few cases of
> combining characters in Unicode 3.0 and the Unicode
> nomarization algorithm uses these as basis for its
> work.

do we supported automatic normalization in 1.6?