problem parsing utf-8 encoded xml - minidom

ashmir.d at ashmir.d at
Fri Jul 4 09:28:27 CEST 2008

On Jul 4, 2:36 pm, "Martin v. Löwis" <mar... at> wrote:
> > The parser is failing on this line:
> > <mrcb245-c>Heinrich Kèufner, Norbert Nedopil, Heinz Schèoch (Hrsg.).</
> > mrcb245-c>
> If it is literally this line, it's no surprise: there must not be a line
> break between the slash and the closing element name.
> However, since you are getting the error in a different column, it's
> indeed more likely that there is a problem with the encoding.
> Given that the Python UTF-8 codec refuses the data, most likely, the
> data is *not* encoded in UTF-8 (but perhaps in Latin-1). If so, you
> need to prefix the XML document with a proper XML declaration, such
> as
> <?xml version="1.0" encoding="iso-8859-1"?>
> Alternatively, make sure that the file is really encoded in UTF-8.
> Regards,
> Martin

There is no line break in the xml file. It was just a formatting issue
on this forum.

However, you were right about the encoding not being
utf-8. The xml file is autogenerated by a different script so that's
probably where it is going wrong.
The parser works fine if I change the first line to
<?xml version="1.0" encoding="iso-8859-1"?>

Thank you very much

More information about the Python-list mailing list