[issue10542] Py_UNICODE_NEXT and other macros for surrogates
Antoine Pitrou
report at bugs.python.org
Tue Aug 16 11:18:46 CEST 2011
Antoine Pitrou <pitrou at free.fr> added the comment:
> I think the 4 macros:
> #define _Py_UNICODE_ISSURROGATE
> #define _Py_UNICODE_ISHIGHSURROGATE
> #define _Py_UNICODE_ISLOWSURROGATE
> #define _Py_UNICODE_JOIN_SURROGATES
> are quite straightforward and can avoid using the trailing _.
I don't want to bikeshed, but can we have proper consistent word
separation?
_Py_UNICODE_IS_HIGH_SURROGATE, not _Py_UNICODE_ISHIGHSURROGATE
(etc.)
> > we will still have to deal with surrogates in codecs,
> > which is where these macros will get used
>
> They will also be used in many str methods and afaiu PEP 393 should
> address that. I'm not sure it addresses codecs and builtin functions
> like chr() and ord() too.
AFAIU, PEP 393 avoids producing surrogate pairs in the canonical
internal representation (that's one of its selling points). Only the
UTF-16 codecs would need to deal with surrogate pairs, in the encoded
form.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10542>
_______________________________________
More information about the Python-bugs-list
mailing list