[New-bugs-announce] [issue8047] Serialiser in ElementTree returns unicode strings in Py3k

Stefan Behnel report at bugs.python.org
Wed Mar 3 08:15:25 CET 2010


New submission from Stefan Behnel <scoder at users.sourceforge.net>:

The xml.etree.ElementTree package in the Python 3.x standard library breaks compatibility with existing ET 1.2 code. The serialiser returns a unicode string when no encoding is passed. Previously, the serialiser was guaranteed to return a byte string. By default, the string was 7-bit ASCII compatible.

This behavioural change breaks all code that relies on the default behaviour of ElementTree. Since there is no longer a default encoding in Python 3, unicode strings are incompatible with byte strings, which means that the result of the serialisation can no longer be written to a file, for example.

XML is well defined as a stream of bytes. Redefining it as a unicode string *by default* is hard to understand at best.

Finally, it would have been good to look at the other ET implementation before introducing such a change. The lxml.etree package has had support for serialising XML into a unicode string for years, and does so in a clear, safe and explicit way. It requires the user to pass the 'unicode' (Py3 'str') type as encoding parameter, e.g.

    tree.tostring(encoding=str)

which is explicit enough to make it clear that this is different from a normal encoding.

----------
components: Library (Lib)
messages: 100333
nosy: scoder
severity: normal
status: open
title: Serialiser in ElementTree returns unicode strings in Py3k
type: behavior
versions: Python 3.1, Python 3.2

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8047>
_______________________________________


More information about the New-bugs-announce mailing list