[issue10521] str methods don't accept non-BMP fillchar on a narrow Unicode build
Ezio Melotti
report at bugs.python.org
Thu Nov 25 07:28:50 CET 2010
Ezio Melotti <ezio.melotti at gmail.com> added the comment:
I think that methods like str.isalpha can and should be fixed. Since _PyUnicode_IsAlpha now accepts a Py_UCS4, the body of unicode_isalpha can be changed to convert normal chars and surrogates pairs to a Py_UCS4 before calling Py_UNICODE_ISALPHA.
The attached patch is a proof of concept of this approach and returns True for '\N{OLD ITALIC LETTER A}'.isalpha() on a narrow build.
It still has a number of issues that should be addressed (check for narrow builds, check for lone surrogates, check for high surrogate at the end of a string, fix compiler warnings ...) but it should be good enough as a PoC.
I would also suggest to introduce a set of macros to handle surrogates (e.g. detect, combine) and use it in all the functions that need to work with them.
----------
keywords: +patch
Added file: http://bugs.python.org/file19809/issue10521-isalpha.diff
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10521>
_______________________________________
More information about the Python-bugs-list
mailing list