Python 3.0b2 cannot map '\u12b'
Tim Roberts
timr at probo.com
Mon Sep 1 00:24:29 EDT 2008
josh logan <dear.jay.logan at gmail.com> wrote:
>
>I am using Python 3.0b2.
>I have an XML file that has the unicode character '\u012b' in it,
>which, when parsed, causes a UnicodeEncodeError:
>
>'charmap' codec can't encode character '\u012b' in position 26:
>character maps to <undefined>
>
>This happens even when I assign this character to a reference in the
>interpreter:
>
>Python 3.0b2 (r30b2:65106, Jul 18 2008, 18:44:17) [MSC v.1500 32 bit
>(Intel)] on
> win32
>Type "help", "copyright", "credits" or "license" for more information.
>>>> s = '\u012b'
>>>> s
>Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "C:\Python30\lib\io.py", line 1428, in write
> b = encoder.encode(s)
> File "C:\Python30\lib\encodings\cp437.py", line 19, in encode
> return codecs.charmap_encode(input,self.errors,encoding_map)[0]
>UnicodeEncodeError: 'charmap' codec can't encode character '\u012b' in
>position
>1: character maps to <undefined>
>
>Is this a known issue, or am I doing something wrong?
Both. U+012B is the Latin lower-case i with macron (i with a bar instead
of a dot). That character does not exist in the 8-bit character set CP437,
which you are trying to use.
If you choose an 8-bit character set that includes i-with-macron, then it
will work. UTF-8 would be a good choice. It's in ISO-8859-10.
--
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.
More information about the Python-list
mailing list