[Python-3000] How will unicode get used?

"Martin v. Löwis" martin at v.loewis.de
Tue Sep 26 21:14:29 CEST 2006


Paul Prescod schrieb:
>  There is at least one big difference between surrogate pairs and
> decomposed characters. The user can typically normalize away
> decompositions. How do you normalize away decompositions in a language
> that only supports 16-bit representations?

I don't see the problem: You use UTF-16; all normal forms (NFC, NFD,
NFKC, NFKD) can be represented in UTF-16 just fine.

It is somewhat tricky to implement a normalization algorithm in
UTF-16, since you must combine surrogate pairs first in order to
find out what the canonical decomposition of the code point is;
but it's just more code, and no problem in principle.

Regards,
Martin


More information about the Python-3000 mailing list