[issue26917] Inconsistency in unicodedata.normalize()?
STINNER Victor
report at bugs.python.org
Tue May 3 05:35:40 EDT 2016
STINNER Victor added the comment:
I tested http://minaret.info/test/normalize.msp
(1)
꾸ᆧ (afb8 11a7) --NFC or NFKC--> 꾸ᆧ (afb8, 11a7) === same than python
꾸ᆧ (afb8 11a7) --NFD or NFKD--> 꾸ᆧ (1101 116e, 11a7) === same than python
(2)
꾸ᆧ (1101 116e 11a7) --NFC or NFKC--> 꾸 (afb8) === same than python
꾸ᆧ (1101 116e 11a7) --NFC or NFKC--> 꾸ᆧ (1101 116e, 11a7) === same than python
(3)
꾸ᆧ㤺 (afb8 11a7 2f8a1) --NFC or NFKC--> 꾸ᆧ㤺 (afb8, 11a7, 393a) == DIFFERENT than python, python eats the U+11a7 character
꾸ᆧ㤺 (afb8 11a7 2f8a1) --NFD or NFKD--> 꾸ᆧ㤺 (1101 116e, 11a7, 393a) === same than python
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26917>
_______________________________________
More information about the Python-bugs-list
mailing list