windows utf8 & lxml
Sayth Renshaw
flebber.crue at gmail.com
Wed Dec 21 04:03:48 EST 2016
On Tuesday, 20 December 2016 22:54:03 UTC+11, Sayth Renshaw wrote:
> Hi
>
> I have been trying to get a script to work on windows that works on mint. The key blocker has been utf8 errors, most of which I have solved.
>
> Now however the last error I am trying to overcome, the solution appears to be to use the .decode('windows-1252') to correct an ascii error.
>
> I am using lxml to read my content and decode is not supported are there any known ways to read with lxml and fix unicode faults?
>
> The key part of my script is
>
> for content in roots:
> utf8_parser = etree.XMLParser(encoding='utf-8')
> fix_ascii = utf8_parser.decode('windows-1252')
> mytree = etree.fromstring(
> content.read().encode('utf-8'), parser=fix_ascii)
>
> Without the added .decode my code looks like
>
> for content in roots:
> utf8_parser = etree.XMLParser(encoding='utf-8')
> mytree = etree.fromstring(
> content.read().encode('utf-8'), parser=utf8_parser)
>
> However doing it in such a fashion returns this error:
>
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
> Which I found this SO for http://stackoverflow.com/a/29217546/461887 but cannot seem to implement with lxml.
>
> Ideas?
>
> Sayth
Why is windows so hard. Sort of running out of ideas, tried methods in the docs SO etc.
Currently
for xml_data in roots:
parser_xml = etree.XMLParser()
mytree = etree.parse(xml_data, parser_xml)
Returns
C:\Users\Sayth\Anaconda3\envs\race\python.exe C:/Users/Sayth/PycharmProjects/bs4race/race.py data/ -e *.xml
Traceback (most recent call last):
File "C:/Users/Sayth/PycharmProjects/bs4race/race.py", line 100, in <module>
data_attr(rootObs)
File "C:/Users/Sayth/PycharmProjects/bs4race/race.py", line 55, in data_attr
mytree = etree.parse(xml_data, parser_xml)
File "src/lxml/lxml.etree.pyx", line 3427, in lxml.etree.parse (src\lxml\lxml.etree.c:81110)
File "src/lxml/parser.pxi", line 1832, in lxml.etree._parseDocument (src\lxml\lxml.etree.c:118109)
File "src/lxml/parser.pxi", line 1852, in lxml.etree._parseFilelikeDocument (src\lxml\lxml.etree.c:118392)
File "src/lxml/parser.pxi", line 1747, in lxml.etree._parseDocFromFilelike (src\lxml\lxml.etree.c:117180)
File "src/lxml/parser.pxi", line 1162, in lxml.etree._BaseParser._parseDocFromFilelike (src\lxml\lxml.etree.c:111907)
File "src/lxml/parser.pxi", line 595, in lxml.etree._ParserContext._handleParseResultDoc (src\lxml\lxml.etree.c:105102)
File "src/lxml/parser.pxi", line 702, in lxml.etree._handleParseResult (src\lxml\lxml.etree.c:106769)
File "src/lxml/lxml.etree.pyx", line 324, in lxml.etree._ExceptionContext._raise_if_stored (src\lxml\lxml.etree.c:12074)
File "src/lxml/parser.pxi", line 373, in lxml.etree._FileReaderContext.copyToBuffer (src\lxml\lxml.etree.c:102431)
io.UnsupportedOperation: read
Process finished with exit code 1
Thoughts?
Sayth
More information about the Python-list
mailing list