[issue10254] unicodedata.normalize('NFC', s) regression

Alexander Belopolsky report at bugs.python.org
Tue Dec 21 20:24:03 CET 2010


Alexander Belopolsky <belopolsky at users.sourceforge.net> added the comment:

In the new patch, issue10254b.diff, I've added a test that would crash unpatched code:

>>> unicodedata.normalize('NFC', 'C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸Ç')
Segmentation fault

Martin, I still feel uneasy about the fixed size of the skipped buffer.  It is not obvious that skipped combining characters always get removed from the buffer before the next starter is processed.

I would really like another pair of eyes to look at this code before it goes in especially to 2.6.

Victor,

IIRC, you did some stress testing on random data.  I wonder if you could test this code after tightening the assert to cskipped < 4.  (The current theory is that this should be enough.)

----------
Added file: http://bugs.python.org/file20131/issue10254b.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10254>
_______________________________________


More information about the Python-bugs-list mailing list