[Python-3000] How will unicode get used?
"Martin v. Löwis"
martin at v.loewis.de
Tue Sep 26 21:14:29 CEST 2006
Paul Prescod schrieb:
> There is at least one big difference between surrogate pairs and
> decomposed characters. The user can typically normalize away
> decompositions. How do you normalize away decompositions in a language
> that only supports 16-bit representations?
I don't see the problem: You use UTF-16; all normal forms (NFC, NFD,
NFKC, NFKD) can be represented in UTF-16 just fine.
It is somewhat tricky to implement a normalization algorithm in
UTF-16, since you must combine surrogate pairs first in order to
find out what the canonical decomposition of the code point is;
but it's just more code, and no problem in principle.
Regards,
Martin
More information about the Python-3000
mailing list