Eurosymbol in xml document

Robert Bossy Robert.Bossy at jouy.inra.fr
Tue Mar 4 13:47:57 CET 2008


Diez B. Roggisch wrote:
> Hellmut Weber wrote:
>
>   
>> Hi,
>> i'm new here in this list.
>>
>> i'm developing a little program using an xml document. So far it's easy
>> going, but when parsing an xml document which contains the EURO symbol
>> ('€') then I get an error:
>>
>> UnicodeEncodeError: 'charmap' codec can't encode character u'\xa4' in
>> position 11834: character maps to <undefined>
>>
>> the relevant piece of code is:
>>
>> from xml.dom.minidom import Document, parse, parseString
>> ...
>> doc = parse(inFIleName)
>>     
>
> The contents of the file must be encoded with the proper encoding which is
> given in the XML-header, or has to be utf-8 if no header is given.
>
> From the above I think you have a latin1-based document. Does the encoding
> header match?
If the file is declared as latin-1 and contains an euro symbol, then the 
file is actually invalid since euro is not defined of in iso-8859-1. If 
there is no encoding declaration, as Diez already said, the file should 
be encoded as utf-8.

Try replacing or adding the encoding with latin-15 (or iso-8859-15) 
which is the same as latin-1 with a few changes, including the euro symbol:

    <?xml version="1.0" encoding="iso-8859-15" ?>


If your file has lot of strange diacritics, you might take a look on the 
little differences between latin-1 and latin-15 in order to make sure 
that your file won't be broken:
    http://en.wikipedia.org/wiki/ISO_8859-15

Cheers,
RB



More information about the Python-list mailing list