[issue10254] unicodedata.normalize('NFC', s) regression
Alexander Belopolsky
report at bugs.python.org
Fri Dec 17 02:34:52 CET 2010
Alexander Belopolsky <belopolsky at users.sourceforge.net> added the comment:
The logic suggested by Martin in msg120018 looks right to me, but the whole code seems to be unnecessarily complex. (And comb1==comb may need to be changed to comb1>=comb.) I don't understand why linear search through "skipped" array is needed. At the very least instead of adding their positions to the "skipped" list, used combining characters can be replaced by a non-character to be later skipped. A better algorithm should be able to avoid the whole issue of "skipping" by properly computing the length of the decomposed character. See internalCompose() at http://www.unicode.org/reports/tr15/Normalizer.java.
I'll try to come up with a patch.
----------
assignee: -> belopolsky
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10254>
_______________________________________
More information about the Python-bugs-list
mailing list