[issue10254] unicodedata.normalize('NFC', s) regression

Alexander Belopolsky report at bugs.python.org
Fri Dec 17 20:17:48 CET 2010


Alexander Belopolsky <belopolsky at users.sourceforge.net> added the comment:

On Fri, Dec 17, 2010 at 2:08 PM, Martin v. Löwis <report at bugs.python.org> wrote:
..
>> As far as I (and a two-line script) can tell
>> the maximum length of a canonical decomposition of a character is 4.
>
> Even better - so allowing for 20 characters should be safe.

I don't disagree, but the number of "break" and "continue" statements
before cskipped++ makes me nervous.  This said, I am going to  add
test cases from the first post to test_unicodedata (I think it is a
better place than test_normalise because the latter is skipped by
default) and commit.

Improving the algorithm is a separate issue.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10254>
_______________________________________


More information about the Python-bugs-list mailing list