Eurosymbol in xml document
Robert.Bossy at jouy.inra.fr
Tue Mar 4 13:47:57 CET 2008
Diez B. Roggisch wrote:
> Hellmut Weber wrote:
>> i'm new here in this list.
>> i'm developing a little program using an xml document. So far it's easy
>> going, but when parsing an xml document which contains the EURO symbol
>> ('€') then I get an error:
>> UnicodeEncodeError: 'charmap' codec can't encode character u'\xa4' in
>> position 11834: character maps to <undefined>
>> the relevant piece of code is:
>> from xml.dom.minidom import Document, parse, parseString
>> doc = parse(inFIleName)
> The contents of the file must be encoded with the proper encoding which is
> given in the XML-header, or has to be utf-8 if no header is given.
> From the above I think you have a latin1-based document. Does the encoding
> header match?
If the file is declared as latin-1 and contains an euro symbol, then the
file is actually invalid since euro is not defined of in iso-8859-1. If
there is no encoding declaration, as Diez already said, the file should
be encoded as utf-8.
Try replacing or adding the encoding with latin-15 (or iso-8859-15)
which is the same as latin-1 with a few changes, including the euro symbol:
<?xml version="1.0" encoding="iso-8859-15" ?>
If your file has lot of strange diacritics, you might take a look on the
little differences between latin-1 and latin-15 in order to make sure
that your file won't be broken:
More information about the Python-list