[issue10254] unicodedata.normalize('NFC', s) regression
Alexander Belopolsky
report at bugs.python.org
Fri Dec 17 20:17:48 CET 2010
Alexander Belopolsky <belopolsky at users.sourceforge.net> added the comment:
On Fri, Dec 17, 2010 at 2:08 PM, Martin v. Löwis <report at bugs.python.org> wrote:
..
>> As far as I (and a two-line script) can tell
>> the maximum length of a canonical decomposition of a character is 4.
>
> Even better - so allowing for 20 characters should be safe.
I don't disagree, but the number of "break" and "continue" statements
before cskipped++ makes me nervous. This said, I am going to add
test cases from the first post to test_unicodedata (I think it is a
better place than test_normalise because the latter is skipped by
default) and commit.
Improving the algorithm is a separate issue.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10254>
_______________________________________
More information about the Python-bugs-list
mailing list