Python 3.0b2 cannot map '\u12b'
Terry Reedy
tjreedy at udel.edu
Mon Sep 1 02:27:54 EDT 2008
Tim Roberts wrote:
> josh logan <dear.jay.logan at gmail.com> wrote:
>> I am using Python 3.0b2.
>> I have an XML file that has the unicode character '\u012b' in it,
>> which, when parsed, causes a UnicodeEncodeError:
>>
>> 'charmap' codec can't encode character '\u012b' in position 26:
>> character maps to <undefined>
>>
>> This happens even when I assign this character to a reference in the
>> interpreter:
>>
>> Python 3.0b2 (r30b2:65106, Jul 18 2008, 18:44:17) [MSC v.1500 32 bit
>> (Intel)] on
>> win32
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> s = '\u012b'
>>>>> s
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> File "C:\Python30\lib\io.py", line 1428, in write
>> b = encoder.encode(s)
>> File "C:\Python30\lib\encodings\cp437.py", line 19, in encode
>> return codecs.charmap_encode(input,self.errors,encoding_map)[0]
>> UnicodeEncodeError: 'charmap' codec can't encode character '\u012b' in
>> position
>> 1: character maps to <undefined>
>>
>> Is this a known issue, or am I doing something wrong?
>
> Both. U+012B is the Latin lower-case i with macron (i with a bar instead
> of a dot). That character does not exist in the 8-bit character set CP437,
> which you are trying to use.
>
> If you choose an 8-bit character set that includes i-with-macron, then it
> will work. UTF-8 would be a good choice. It's in ISO-8859-10.
I doubt the OP 'chose' cp437. Why does Python using cp437 even when the
default encoding is utf-8?
On WinXP
>>> sys.getdefaultencoding()
'utf-8'
>>> s='\u012b'
>>> s
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python30\lib\io.py", line 1428, in write
b = encoder.encode(s)
File "C:\Program Files\Python30\lib\encodings\cp437.py", line 19, in
encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u012b' in
position
1: character maps to <undefined>
To put it another way, how can one 'choose' utf-8 for display to screen?
Using IDLE, display works fine.
IDLE 3.0b2
>>> s='\u012b'
>>> s
'ī' # i macron
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
I ran across this is a different context and mentioned it on the bug
tracker, but the Windows interpreter seems broken here.
I will send this in UTF-8 so the i-macron will hopefully show up.
tjr
More information about the Python-list
mailing list