Python 3.0b2 cannot map '\u12b'

Tim Roberts timr at probo.com
Mon Sep 1 06:24:29 CEST 2008


josh logan <dear.jay.logan at gmail.com> wrote:
>
>I am using Python 3.0b2.
>I have an XML file that has the unicode character '\u012b' in it,
>which, when parsed, causes a UnicodeEncodeError:
>
>'charmap' codec can't encode character '\u012b' in position 26:
>character maps to <undefined>
>
>This happens even when I assign this character to a reference in the
>interpreter:
>
>Python 3.0b2 (r30b2:65106, Jul 18 2008, 18:44:17) [MSC v.1500 32 bit
>(Intel)] on
> win32
>Type "help", "copyright", "credits" or "license" for more information.
>>>> s = '\u012b'
>>>> s
>Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "C:\Python30\lib\io.py", line 1428, in write
>    b = encoder.encode(s)
>  File "C:\Python30\lib\encodings\cp437.py", line 19, in encode
>    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
>UnicodeEncodeError: 'charmap' codec can't encode character '\u012b' in
>position
>1: character maps to <undefined>
>
>Is this a known issue, or am I doing something wrong?

Both.  U+012B is the Latin lower-case i with macron (i with a bar instead
of a dot).  That character does not exist in the 8-bit character set CP437,
which you are trying to use.

If you choose an 8-bit character set that includes i-with-macron, then it
will work.  UTF-8 would be a good choice.  It's in ISO-8859-10.
-- 
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.



More information about the Python-list mailing list