[issue8047] Serialiser in ElementTree returns unicode strings in Py3k
report at bugs.python.org
Wed Mar 3 08:15:25 CET 2010
New submission from Stefan Behnel <scoder at users.sourceforge.net>:
The xml.etree.ElementTree package in the Python 3.x standard library breaks compatibility with existing ET 1.2 code. The serialiser returns a unicode string when no encoding is passed. Previously, the serialiser was guaranteed to return a byte string. By default, the string was 7-bit ASCII compatible.
This behavioural change breaks all code that relies on the default behaviour of ElementTree. Since there is no longer a default encoding in Python 3, unicode strings are incompatible with byte strings, which means that the result of the serialisation can no longer be written to a file, for example.
XML is well defined as a stream of bytes. Redefining it as a unicode string *by default* is hard to understand at best.
Finally, it would have been good to look at the other ET implementation before introducing such a change. The lxml.etree package has had support for serialising XML into a unicode string for years, and does so in a clear, safe and explicit way. It requires the user to pass the 'unicode' (Py3 'str') type as encoding parameter, e.g.
which is explicit enough to make it clear that this is different from a normal encoding.
components: Library (Lib)
title: Serialiser in ElementTree returns unicode strings in Py3k
versions: Python 3.1, Python 3.2
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list