Problem with Unicode char in Python 3.3.0
tjreedy at udel.edu
Tue Jan 8 09:40:47 CET 2013
On 1/7/2013 8:12 AM, Terry Reedy wrote:
> On 1/7/2013 7:57 AM, Franck Ditter wrote:
>> <<< print('\U0001d11e')
>> Traceback (most recent call last):
>> File "<pyshell#1>", line 1, in <module>
>> UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
>> in position 0: Non-BMP character not supported in Tk
> The message comes from printing to a tk text widget (the IDLE shell),
> not from creating the 1 char string. c = '\U0001d11e' works fine. When
> you have problems with creating and printing unicode, *separate*
> creating from printing to see where the problem is. (I do not know if
> the brand new tcl/tk 8.6 is any better.)
> The windows console also chokes, but with a different message.
> >>> c='\U0001d11e'
> >>> print(c)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "C:\Programs\Python33\lib\encodings\cp437.py", line 19, in encode
> return codecs.charmap_encode(input,self.errors,encoding_map)
> UnicodeEncodeError: 'charmap' codec can't encode character '\U0001d11e'
> in posit
> ion 0: character maps to <undefined>
> Yes, this is very annoying, especially in Win 7.
The above is in 3.3, in which '\U0001d11e' is actually translated to a
length 1 string. In 3.2-, that literal is translated (on 3.2- narrow
builds, as on Windows) to a length 2 string surrogate pair (in the BMP).
On printing, the pair of surrogates got translated to a square box used
for all characters for which the font does not have a glyph. 𝄞When cut
and pasted, it shows in this mail composer as a weird music sign with
3 -s, 3 spaces, paste, 3 spaces, 3 -s, but it may disappear.
--- 𝄞 ---
So 3.3 is the first Windows version to get the UnicodeEncodeError on
Terry Jan Reedy
More information about the Python-list