[issue10542] Py_UNICODE_NEXT and other macros for surrogates

Antoine Pitrou report at bugs.python.org
Tue Aug 16 11:18:46 CEST 2011


Antoine Pitrou <pitrou at free.fr> added the comment:

> I think the 4 macros:
>  #define _Py_UNICODE_ISSURROGATE
>  #define _Py_UNICODE_ISHIGHSURROGATE
>  #define _Py_UNICODE_ISLOWSURROGATE
>  #define _Py_UNICODE_JOIN_SURROGATES
> are quite straightforward and can avoid using the trailing _.

I don't want to bikeshed, but can we have proper consistent word
separation?
_Py_UNICODE_IS_HIGH_SURROGATE, not _Py_UNICODE_ISHIGHSURROGATE
(etc.)

> > we will still have to deal with surrogates in codecs,
> > which is where these macros will get used
> 
> They will also be used in many str methods and afaiu PEP 393 should
> address that.  I'm not sure it addresses codecs and builtin functions
> like chr() and ord() too.

AFAIU, PEP 393 avoids producing surrogate pairs in the canonical
internal representation (that's one of its selling points). Only the
UTF-16 codecs would need to deal with surrogate pairs, in the encoded
form.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10542>
_______________________________________


More information about the Python-bugs-list mailing list