[issue10254] unicodedata.normalize('NFC', s) regression
Alexander Belopolsky
report at bugs.python.org
Fri Dec 17 18:24:54 CET 2010
Alexander Belopolsky <belopolsky at users.sourceforge.net> added the comment:
On Fri, Dec 17, 2010 at 3:47 AM, Martin v. Löwis <report at bugs.python.org> wrote:
..
> The worst case (wrt. cskipped) is the maximum number of characters that
> can get combined into a single base character. It used to be (and I
> hope still is) 20 (decomposition of U+FDFA).
>
The C forms (NFC and NFKC) do canonical composition and U+FDFA is a
compatibility composite. (BTW, makeunicodedata.py checks that maximum
decomposed length of a character is < 19, but it would be better if it
would compute and define a named constant, say MAXDLENGTH, to be used
instead of literal 20.) As far as I (and a two-line script) can tell
the maximum length of a canonical decomposition of a character is 4.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10254>
_______________________________________
More information about the Python-bugs-list
mailing list