[issue26917] Inconsistency in unicodedata.normalize()?

Tue May 3 05:35:40 EDT 2016

STINNER Victor added the comment:

I tested http://minaret.info/test/normalize.msp

(1)

꾸ᆧ (afb8 11a7) --NFC or NFKC--> 꾸ᆧ (afb8, 11a7) === same than python
꾸ᆧ (afb8 11a7) --NFD or NFKD--> 꾸ᆧ (1101 116e, 11a7) === same than python

(2)

꾸ᆧ (1101 116e 11a7) --NFC or NFKC--> 꾸 (afb8) === same than python
꾸ᆧ (1101 116e 11a7) --NFC or NFKC--> 꾸ᆧ (1101 116e, 11a7) === same than python

(3)

꾸ᆧ㤺 (afb8 11a7 2f8a1) --NFC or NFKC--> 꾸ᆧ㤺 (afb8, 11a7, 393a) == DIFFERENT than python, python eats the U+11a7 character
꾸ᆧ㤺 (afb8 11a7 2f8a1) --NFD or NFKD--> 꾸ᆧ㤺 (1101 116e, 11a7, 393a) === same than python

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26917>
_______________________________________