ElementTree XML parsing problem

Wed Apr 27 16:32:22 EDT 2011

On 4/27/2011 12:33 PM, Hegedüs Ervin wrote:
> hello,
>
>> I'm using ElementTree to parse an XML file, but it stops at the
>> second record (id = 002), which contains a non-standard ascii
>> character, ä. Here's the XML:
>>
>> <?xml version="1.0"?>
>> <snapshot time="Mon Apr 25 08:47:23 PDT 2011">
>> <records>
>> <record id="001" education="High School" employment="7 yrs" />
>> <record id="002" education="Universität Bremen" employment="3 years" />
>> <record id="003" education="River College" employment="5 yrs" />
>> </records>
>> </snapshot>
>>
>> The complaint offered up by the parser is
>
> I've checked this xml with your script, I think your locales
> settings are not good.
>
> $ ./parse.py
>
> XML file: test.xml
> 001 High School
> 002 Universität Bremen
> 003 River College
>
> (name of xml file is "test.xml")
>
> So, I started change the codepage mark of xml:
>
> <?xml version="1.0" encoding="UTF-8" ?>  - same result
> <?xml version="1.0" encoding="ISO-8859-2" ?>  - same result
> <?xml version="1.0" encoding="ISO-8859-1" ?>  - same result
>
> and then:
> <?xml version="1.0" encoding="ascii" ?>  - gives same error as you
> described.
>
> Try to change XML encoding.
>
>
> a.

Thanks, Hegedüs and everyone else who responded. That is exactly it - 
I'm afraid I probably missed it in the docs because I was searching for 
terms like "unicode" and "coerce." In any event, that solves the 
problem. Thanks!

-- Mike --