[issue9804] ascii() does not always join surrogate pairs

STINNER Victor report at bugs.python.org
Thu Sep 9 00:50:58 CEST 2010


STINNER Victor <victor.stinner at haypocalc.com> added the comment:

For unicode, ascii(x) is implemented as repr(x).encode('ascii', 'backslashreplace').decode('ascii').

repr(x) is "'" + x + "'" for printable characters (eg. U+1D121), and "'U+%08x'" % ord(x) for not printable characters (eg. U+12FFF).

About the unexpected output, the problem is that ascii+backslashreplace encodes non-BMP printable characters as b'\\uXXXX\\uXXXX' in narrow builds.

I don't see simple solution to encode non-BMP characters as b'\\UXXXXXXXX' because the principle of error handler is that it escapes non encodable characters one by one.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9804>
_______________________________________


More information about the Python-bugs-list mailing list