[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

Fredrik Lundh report at bugs.python.org
Fri Mar 12 11:14:08 CET 2010


Fredrik Lundh <fredrik at effbot.org> added the comment:

"Yes, the feature has been implemented deep down in the _encode() helper function, so it impacts the entire serialiser, not only its API"

Ouch.

>>> import locale
>>> locale.getpreferredencoding() == "utf-8"
False
>>> from xml.etree.ElementTree import *
>>> e = Element("tag")
>>> e.text = "hellö"
>>> tostring(e)
'<tag>hellö</tag>'
>>> ElementTree(e).write("out.xml")
>>> tree = parse("out.xml")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python31\lib\xml\etree\ElementTree.py", line 843, in parse
    tree.parse(source, parser)
  File "C:\Python31\lib\xml\etree\ElementTree.py", line 581, in parse
    parser.feed(data)
  File "C:\Python31\lib\xml\etree\ElementTree.py", line 1221, in feed
    self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 9

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8047>
_______________________________________


More information about the Python-bugs-list mailing list