[issue13391] string.strip Does Not Remove Zero-Width-Space (ZWSP)
report at bugs.python.org
Sun Nov 13 04:44:46 CET 2011
Ezio Melotti <ezio.melotti at gmail.com> added the comment:
str.strip uses Py_UNICODE_ISSPACE that in turn uses _PyUnicode_IsWhitespace (see Objects/unicodetype_db.h#l3347), and according to the comment there it "Returns 1 for Unicode characters having the bidirectional type 'WS', 'B' or 'S' or the category 'Zs', 0 otherwise."
The category of U+200B is 'Cf', and its bidirectional type is 'BN' so 0 is returned and the character is not stripped.
OTOH, Unicode defines the White_Space property and assigns it to 26 chars, whereas _PyUnicode_IsWhitespace includes 4 more chars (1C, 1D, 1E, 1F) that should probably be removed.
I'll close this issue because str.strip() is correct regarding U+200B.
Do you think those 4 chars should be removed?
If so I'll open another issue.
assignee: -> ezio.melotti
resolution: -> invalid
stage: -> committed/rejected
status: open -> closed
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list