[issue10567] Unicode space character \u200b unrecognised a space
report at bugs.python.org
Sun Nov 28 20:07:41 CET 2010
Marc-Andre Lemburg <mal at egenix.com> added the comment:
Martin v. Löwis wrote:
> Martin v. Löwis <martin at v.loewis.de> added the comment:
> In 2.6, there was a manually maintained list, probably dating back to before Unicode 4.0.
That's not quite correct: Python 1.6.x - 2.5.x used tables for the
PyUnicode_ISSPACE() function that were created from the Unicode database.
Python 2.6.x introduced a short-cut table for ASCII whitespace, but still
reverted back to the generated tables for non-ASCII code points.
The tables were never manually maintained, but we also did not update
Python for each new Unicode version:
Python 1.6: Unicode 3.0
Python 2.0: Unicode 3.0
Python 2.1: Unicode 3.0
Python 2.2: Unicode 3.0
Python 2.3: Unicode 3.2
Python 2.4: Unicode 3.2
Python 2.5: Unicode 4.1
Python 2.6: Unicode 5.1
Python 2.7: Unicode 5.2
> Python uses the following criterion for determining white space characters:
> /* Returns 1 for Unicode characters having the bidirectional type
> 'WS', 'B' or 'S' or the category 'Zs', 0 otherwise. */
This definition has been used since Python 1.6.x.
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list