[issue9804] ascii() does not always join surrogate pairs
report at bugs.python.org
Thu Sep 9 00:50:58 CEST 2010
STINNER Victor <victor.stinner at haypocalc.com> added the comment:
For unicode, ascii(x) is implemented as repr(x).encode('ascii', 'backslashreplace').decode('ascii').
repr(x) is "'" + x + "'" for printable characters (eg. U+1D121), and "'U+%08x'" % ord(x) for not printable characters (eg. U+12FFF).
About the unexpected output, the problem is that ascii+backslashreplace encodes non-BMP printable characters as b'\\uXXXX\\uXXXX' in narrow builds.
I don't see simple solution to encode non-BMP characters as b'\\UXXXXXXXX' because the principle of error handler is that it escapes non encodable characters one by one.
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list