[issue45105] Incorrect handling of unicode character \U00010900

Sun Sep 5 09:56:38 EDT 2021

Eryk Sun <eryksun at gmail.com> added the comment:

> I think you may be mistaken. In Max's original post, he has
>   s = '000X'

It displays that way for me under Firefox in Linux, but what's really there when I copy it from Firefox is '0\U0001090000', which matches the result Max gets for individual index operations such as s[1]. 

The "0" characters following the R-T-L character have weak directionality. So the string displays the same as "000\U00010900". If you print with spaces and use a number sequence, the substring starting with the R-T-L character should display reversed, i.e. print(*'123\U00010900456') should display the same as print(*'123654\U00010900'). But "abc" in print(*'123\U00010900abc') should not display reversed since it has L-T-R directionality.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45105>
_______________________________________