[issue10567] Unicode space character \u200b unrecognised a space

Marc-Andre Lemburg report at bugs.python.org
Sun Nov 28 20:35:13 CET 2010


Marc-Andre Lemburg <mal at egenix.com> added the comment:

It is still strange that the .isspace() property value changed,
since the code point has not changed in the recent Unicode versions:

4.1.0: 200B;ZERO WIDTH SPACE;Cf;0;BN;;;;;N;;;;;
5.1.0: 200B;ZERO WIDTH SPACE;Cf;0;BN;;;;;N;;;;;
5.2.0: 200B;ZERO WIDTH SPACE;Cf;0;BN;;;;;N;;;;;
6.0.0: 200B;ZERO WIDTH SPACE;Cf;0;BN;;;;;N;;;;;

based on http://www.unicode.org/Public/<version>/ucd/UnicodeData.txt

True
> python2.5 -c 'print u"\u200b".isspace()'
True
> python2.6 -c 'print u"\u200b".isspace()'
True
> python2.7 -c 'print u"\u200b".isspace()'
False

Looking at the code again: Now I know why...

The tables in unicodectype.c were generated from the Unicode database,
but not by the makeunicodedata.py script. I used a script to generate
those tables for Python 1.6.0 and it seems that they were never updated
since then. Python 2.7 then replaced them with the data from the
makeunicodedata.py script.

That's probably why Martin thought they were manually maintained.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10567>
_______________________________________


More information about the Python-bugs-list mailing list