[issue8047] Serialiser in ElementTree returns unicode strings in Py3k
Guido van Rossum
report at bugs.python.org
Fri Mar 12 01:35:34 CET 2010
Guido van Rossum <guido at python.org> added the comment:
Hey, can we all try to get along?
For anyone who didn't follow the link to r56841, that was mine (though Christian Heimes provided the basis for much of the patch apart from elementtree), and I wrote at the time:
"""I had to fix a few tests and modules beyond what Christian did, and
invent a few conventions. E.g. in elementtree, I chose to
write/return Unicode strings whe no encoding is given, but bytes when
an explicit encoding is given."""
I am not a user of elementtree, so this may well have been a mistake -- at the time (in 2007) we were so busy making zillions of tests pass that some mistakes were made. Some of those were caught in time, others apparently not.
My thinking was that since an XML document looks like text, it should probably be considered text, at least by default. (There may have been some unittests that appeared to require this -- of course this was probably just the confusion between byte strings and 8-bit text strings inherent in Python 2.)
Regarding backwards compatibility, there are now two backwards compatibility problems: with 2.x, and with 3.1. It seems we cannot easily be backwards compatible with both (though if someone figures out a way that would be best of course).
If I were to propose an API for returning a Unicode string, I would probably add a new method (e.g. tounicode()) rather than using a "magical" argument (tostring(encoding=str)), but given that that exists in another supposedly-compatible implementation I'm not against it. Maybe tostring(encoding=None) could also be made to work? That would at least make it *possible* to write code that receives a text object and that works in 3.1 and 3.2 both. In 2.x I think neither of these should work, and there probably isn't a need -- apps needing full compatibility will just have to refrain from calling tostring() without arguments.
ISTM that the behavior of write() is just fine -- the contents of the file will be correct after all.
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list