encoding="utf8" ignored when parsing XML
skip.montanaro at gmail.com
Tue Dec 27 10:47:16 EST 2016
Peter> Isn't UTF-8 the default?
Apparently not. I believe in my reading it said that it used whatever
locale.getpreferredencoding() returned. That's problematic when you
live in a country that thinks ASCII is everything. Personally, I think
UTF-8 should be the default, but that train's long left the station,
at least for Python 2.x.
> Try opening the file in binary mode then:
> with io.open(fname, "rb") as f:
> root = xml.tree.ElementTree.parse(f).getroot()
Thanks, that worked. Would appreciate an explanation of why binary
mode was necessary. It would seem that since the file contents are
text, just in a non-ASCII encoding, that specifying the encoding when
opening the file should do the trick.
More information about the Python-list