windows utf8 & lxml
Sayth Renshaw
flebber.crue at gmail.com
Tue Dec 20 06:53:42 EST 2016
Hi
I have been trying to get a script to work on windows that works on mint. The key blocker has been utf8 errors, most of which I have solved.
Now however the last error I am trying to overcome, the solution appears to be to use the .decode('windows-1252') to correct an ascii error.
I am using lxml to read my content and decode is not supported are there any known ways to read with lxml and fix unicode faults?
The key part of my script is
for content in roots:
utf8_parser = etree.XMLParser(encoding='utf-8')
fix_ascii = utf8_parser.decode('windows-1252')
mytree = etree.fromstring(
content.read().encode('utf-8'), parser=fix_ascii)
Without the added .decode my code looks like
for content in roots:
utf8_parser = etree.XMLParser(encoding='utf-8')
mytree = etree.fromstring(
content.read().encode('utf-8'), parser=utf8_parser)
However doing it in such a fashion returns this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
Which I found this SO for http://stackoverflow.com/a/29217546/461887 but cannot seem to implement with lxml.
Ideas?
Sayth
More information about the Python-list
mailing list