windows utf8 & lxml

Steve D'Aprano steve+python at
Tue Dec 27 05:46:35 EST 2016

On Tue, 20 Dec 2016 10:53 pm, Sayth Renshaw wrote:

>'utf-8'), parser=utf8_parser)
> However doing it in such a fashion returns this error:
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0:
> invalid start byte

That tells you that the XML file you have is not actually UTF-8.

You have a file that begins with a byte 0xFF. That is invalid UTF-8. No
valid UTF-8 string contains the byte 0xFF.

So you need to consider:

- Are you sure that the input file is intended to be UTF-8? How was it

- Is the second byte 0xFE? If so, that suggests that you actually have
UTF-16 with a byte-order mark.

“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

More information about the Python-list mailing list