[issue10567] Unicode space character \u200b unrecognised a space

Marc-Andre Lemburg report at bugs.python.org
Sun Nov 28 20:07:41 CET 2010


Marc-Andre Lemburg <mal at egenix.com> added the comment:

Martin v. Löwis wrote:
> 
> Martin v. Löwis <martin at v.loewis.de> added the comment:
> 
> In 2.6, there was a manually maintained list, probably dating back to before Unicode 4.0. 

That's not quite correct: Python 1.6.x - 2.5.x used tables for the
PyUnicode_ISSPACE() function that were created from the Unicode database.
Python 2.6.x introduced a short-cut table for ASCII whitespace, but still
reverted back to the generated tables for non-ASCII code points.

The tables were never manually maintained, but we also did not update
Python for each new Unicode version:

Python 1.6: Unicode 3.0
Python 2.0: Unicode 3.0
Python 2.1: Unicode 3.0
Python 2.2: Unicode 3.0
Python 2.3: Unicode 3.2
Python 2.4: Unicode 3.2
Python 2.5: Unicode 4.1
Python 2.6: Unicode 5.1
Python 2.7: Unicode 5.2

> Python uses the following criterion for determining white space characters:
>
> /* Returns 1 for Unicode characters having the bidirectional type
>    'WS', 'B' or 'S' or the category 'Zs', 0 otherwise. */

This definition has been used since Python 1.6.x.

----------
nosy: +lemburg

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10567>
_______________________________________


More information about the Python-bugs-list mailing list