[issue10542] Py_UNICODE_NEXT and other macros for surrogates

Alexander Belopolsky report at bugs.python.org
Wed Dec 29 20:26:19 CET 2010


Alexander Belopolsky <belopolsky at users.sourceforge.net> added the comment:

On Wed, Dec 29, 2010 at 11:36 AM, Georg Brandl <report at bugs.python.org> wrote:
..
> That bug already strikes me as quite exotic.
>
Would it look as exotic if presented like this?

  File "<stdin>", line 1
    𐌀 = 5
       ^
SyntaxError: invalid character in identifier
(works on a wide build)

Note that with few exceptions, pretty much anything you can do with
supplementary characters will produce different results in wide and
narrow builds.  This includes all character type methods (isalpha,
isdigit, etc.), transformations such as case folding or normalization,
text formatting, etc, etc.

When I suggested on python-dev that supplementary character support on
narrow builds is not worth violating fundamental invariants such as
len(chr(i)) == 1, pretty much everyone said that Python should support
full Unicode regardless of build.  When it comes to fixing specific
differences between builds, I hear that these differences are not
important because no one is using supplementary characters.

This example is less exotic than say str.center() or str.swapcase()
not because it involves less exotic characters - all non-BMP
characters are exotic by definition - but because it involves the core
definition of the Python language.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10542>
_______________________________________


More information about the Python-bugs-list mailing list