Oct. 29, 2009
11:41 a.m.
Praktikant3 - SAG wrote:
The debugging continues. The issue below has been when I read the file using:
input_xml = ET.parse(input_filename).getroot()
If I change this to:
input_xml = ET.XML(file(input_filename, "rb").read())
I get the UnicodeDecodeError in each of the (1)/(2) combinations.
Your input file has a bogus encoding declaration and/or encoding errors.
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe5 in position 2: unexpected end of data
In utf-8, 0xe5 is the start of a 3-byte sequence. It must be followed by two more chars. -- Marcello Perathoner webmaster@gutenberg.org