[issue10567] Unicode space character \u200b unrecognised a space

Marc-Andre Lemburg report at bugs.python.org
Sun Nov 28 20:35:13 CET 2010

Marc-Andre Lemburg <mal at egenix.com> added the comment:

It is still strange that the .isspace() property value changed,
since the code point has not changed in the recent Unicode versions:

4.1.0: 200B;ZERO WIDTH SPACE;Cf;0;BN;;;;;N;;;;;
5.1.0: 200B;ZERO WIDTH SPACE;Cf;0;BN;;;;;N;;;;;
5.2.0: 200B;ZERO WIDTH SPACE;Cf;0;BN;;;;;N;;;;;
6.0.0: 200B;ZERO WIDTH SPACE;Cf;0;BN;;;;;N;;;;;

based on http://www.unicode.org/Public/<version>/ucd/UnicodeData.txt

> python2.5 -c 'print u"\u200b".isspace()'
> python2.6 -c 'print u"\u200b".isspace()'
> python2.7 -c 'print u"\u200b".isspace()'

Looking at the code again: Now I know why...

The tables in unicodectype.c were generated from the Unicode database,
but not by the makeunicodedata.py script. I used a script to generate
those tables for Python 1.6.0 and it seems that they were never updated
since then. Python 2.7 then replaced them with the data from the
makeunicodedata.py script.

That's probably why Martin thought they were manually maintained.


Python tracker <report at bugs.python.org>

More information about the Python-bugs-list mailing list