[issue13391] string.strip Does Not Remove Zero-Width-Space (ZWSP)

Ezio Melotti report at bugs.python.org
Sun Nov 13 04:44:46 CET 2011


Ezio Melotti <ezio.melotti at gmail.com> added the comment:

str.strip uses Py_UNICODE_ISSPACE that in turn uses _PyUnicode_IsWhitespace (see Objects/unicodetype_db.h#l3347), and according to the comment there it "Returns 1 for Unicode characters having the bidirectional type 'WS', 'B' or 'S' or the category 'Zs', 0 otherwise."
The category of U+200B is 'Cf', and its bidirectional type is 'BN' so 0 is returned and the character is not stripped.

OTOH, Unicode defines the White_Space property and assigns it to 26 chars, whereas _PyUnicode_IsWhitespace includes 4 more chars (1C, 1D, 1E, 1F) that should probably be removed.

I'll close this issue because str.strip() is correct regarding U+200B.

@Martin
Do you think those 4 chars should be removed?
If so I'll open another issue.

----------
assignee:  -> ezio.melotti
nosy: +loewis
resolution:  -> invalid
stage:  -> committed/rejected
status: open -> closed

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13391>
_______________________________________


More information about the Python-bugs-list mailing list