problem parsing utf-8 encoded xml - minidom

"Martin v. Löwis" martin at v.loewis.de
Fri Jul 4 02:36:56 EDT 2008


> The parser is failing on this line:
> 
> <mrcb245-c>Heinrich Kèufner, Norbert Nedopil, Heinz Schèoch (Hrsg.).</
> mrcb245-c>

If it is literally this line, it's no surprise: there must not be a line
break between the slash and the closing element name.

However, since you are getting the error in a different column, it's
indeed more likely that there is a problem with the encoding.

Given that the Python UTF-8 codec refuses the data, most likely, the
data is *not* encoded in UTF-8 (but perhaps in Latin-1). If so, you
need to prefix the XML document with a proper XML declaration, such
as

<?xml version="1.0" encoding="iso-8859-1"?>

Alternatively, make sure that the file is really encoded in UTF-8.

Regards,
Martin



More information about the Python-list mailing list