ElementTree XML parsing problem

Philip Semanchuk philip at semanchuk.com
Wed Apr 27 15:32:55 EDT 2011


On Apr 27, 2011, at 2:26 PM, Mike wrote:

> I'm using ElementTree to parse an XML file, but it stops at the second record (id = 002), which contains a non-standard ascii character, ä. Here's the XML:
> 
> <?xml version="1.0"?>
> <snapshot time="Mon Apr 25 08:47:23 PDT 2011">
> <records>
> <record id="001" education="High School" employment="7 yrs" />
> <record id="002" education="Universität Bremen" employment="3 years" />
> <record id="003" education="River College" employment="5 yrs" />
> </records>
> </snapshot>
> 
> The complaint offered up by the parser is
> 
> Unexpected error opening simple_fail.xml: not well-formed (invalid token): line 5, column 40

You've gotten a number of good observations & suggestions already. I would add that if you're saving your XML file from a text editor, make sure you're saving it as UTF-8 and not ISO-8859-1 or Win-1252. 


bye
Philip




More information about the Python-list mailing list