Eurosymbol in xml document

Stephan Diehl stephan.diehl at gmx.net
Tue Mar 4 07:36:08 EST 2008


Hallo Helmut,

> Hi,
> i'm new here in this list.
> 
> i'm developing a little program using an xml document. So far it's easy
> going, but when parsing an xml document which contains the EURO symbol
> ('€') then I get an error:
> 
> UnicodeEncodeError: 'charmap' codec can't encode character u'\xa4' in
> position 11834: character maps to <undefined>

first of all, unicode handling is a little bit difficult, when encountered
the first time, but in the end, it really makes a lot of sense :-)
Please read some python unicode tutorial like
http://www.amk.ca/python/howto/unicode

If you open up a python interactive prompt, you can do the following:
>>> print u'\u20ac'
€
>>> u'\u20ac'.encode('utf-8')
'\xe2\x82\xac'
>>> u'\u20ac'.encode('iso-8859-15')
'\xa4'
>>> u'\u20ac'.encode('iso-8859-1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u20ac' in
position 0: 

\u20ac is the unicode code point for the Euro sign, so u'\u20ac' is the
unicode euro sign in python. The different encode calls translate the
unicode into actual encodings.
What you are seeing in your xml document is the iso-8859-15 encoded euro
sign. As Diez already noted, you must make shure, that 
1. the whole xml document is encoded in latin-15 and the encoding header
reflects that
or
2. make sure that the utf-8 encoded euro sign is in your xml document.

Hope that makes sense

Stephan




More information about the Python-list mailing list