[Tutor] unicode question
Peter Otten
__peter__ at web.de
Mon Jun 6 17:56:29 EDT 2022
On 06/06/2022 22:04, Alex Kleider wrote:
> I've been playing around with unicode a bit and found that the
> following code doesn't behave as I might have expected:
>
> Python 3.9.2 (default, Feb 28 2021, 17:03:44)
> [GCC 10.2.1 20210110] on linux
> Type "help", "copyright", "credits" or "license" for more information.
>>>> print("\N{middle dot}")
> ·
>>>>
>>>> middle_dot = '0140', '013F', '00B7', '2027'
>>>> ucode = ['\\u' + dot for dot in middle_dot]
>>>> for dot in ucode:
> ... print(dot)
> ...
> \u0140
> \u013F
> \u00B7
> \u2027
>>>> print("\u0140")
> ŀ
>>>> print("\u013f")
> Ŀ
>>>> print("\u00b7") # the one I want
> ·
>>>> print("\u2027")
> ‧
>>>>
>
> I was expecting the for loop to output the same as the last four print
> statements but, alas, not so.
"\\u" is a string containing the backslash followed by a "u" -- and that
won't change when you concatenate another string like "0140".
The easiest way to realize the loop would be to use integers:
>>> for i in 0x140, 0x13f: print(chr(i))
ŀ
Ŀ
The obvious way when you want to start with strings is
>>> for c in "0140", "013f":
print(eval(f"'\\u{c}'")) # dangerous, may execute arbitrary code
ŀ
Ŀ
with the safe alternative
>>> for c in "0140", "013f":
print(codecs.decode(f"\\u{c}", "unicode-escape"))
ŀ
Ŀ
More information about the Tutor
mailing list