encoding="utf8" ignored when parsing XML
__peter__ at web.de
Tue Dec 27 11:10:41 EST 2016
Skip Montanaro wrote:
> Peter> Isn't UTF-8 the default?
> Apparently not.
Sorry, I meant the default for XML.
> I believe in my reading it said that it used whatever
> locale.getpreferredencoding() returned. That's problematic when you
> live in a country that thinks ASCII is everything. Personally, I think
> UTF-8 should be the default, but that train's long left the station,
> at least for Python 2.x.
>> Try opening the file in binary mode then:
>> with io.open(fname, "rb") as f:
>> root = xml.tree.ElementTree.parse(f).getroot()
> Thanks, that worked. Would appreciate an explanation of why binary
> mode was necessary. It would seem that since the file contents are
> text, just in a non-ASCII encoding, that specifying the encoding when
> opening the file should do the trick.
My tentative explanation would be: If you open the file as text it will be
successfully decoded, i. e.
works, but to go back to the bytes that the XML parser needs the "preferred
encoding", in your case ASCII, will be used.
Since there are non-ascii characters you get a UnicodeEncodeError.
More information about the Python-list