[Python-3000] How will unicode get used?

Paul Prescod paul at prescod.net
Tue Sep 26 22:44:07 CEST 2006


I misspoke. I meant to ask: "How do you normalize away surrogate pairs in
UTF-16?" It was a rhetorical question. The point was just that decomposed
characters can be handled by implicit or explicit normalization. Surrogate
pairs can only be similarly normalized away if your model allows you to
represent their normalized forms. A UTF-16 characters model would not.

On 9/26/06, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>
> Paul Prescod schrieb:
> >  There is at least one big difference between surrogate pairs and
> > decomposed characters. The user can typically normalize away
> > decompositions. How do you normalize away decompositions in a language
> > that only supports 16-bit representations?
>
> I don't see the problem: You use UTF-16; all normal forms (NFC, NFD,
> NFKC, NFKD) can be represented in UTF-16 just fine.
>
> It is somewhat tricky to implement a normalization algorithm in
> UTF-16, since you must combine surrogate pairs first in order to
> find out what the canonical decomposition of the code point is;
> but it's just more code, and no problem in principle.
>
> Regards,
> Martin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060926/717e1ceb/attachment.html 


More information about the Python-3000 mailing list