[issue5166] ElementTree and minidom don't prevent creation of not well-formed XML

Denis S. Otkidach report at bugs.python.org
Fri Feb 6 12:13:45 CET 2009


New submission from Denis S. Otkidach <denis.otkidach at gmail.com>:

ElementTree and minidom allow creation of not well-formed XML, that
can't be parsed:

>>> from xml.etree import ElementTree
>>> element = ElementTree.Element('element')
>>> element.text = u'\0'
>>> xml = ElementTree.tostring(element, encoding='utf-8')
>>> ElementTree.fromstring(xml)
[...]
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1,
column 9

>>> from xml.dom import minidom
>>> doc = minidom.getDOMImplementation().createDocument(None, None, None)
>>> element = doc.createElement('element')
>>> element.appendChild(doc.createTextNode(u'\0'))
<DOM Text node "">
>>> doc.appendChild(element)
<DOM Element: element at 0xb7ca688c>
>>> xml = doc.toxml(encoding='utf-8')
>>> minidom.parseString(xml)
[...]
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, colum

I believe they should raise some exception when there are characters 
not allowed in XML (http://www.w3.org/TR/REC-xml/#NT-Char) are used in
attribute values, text nodes and CDATA sections.

----------
components: Library (Lib)
messages: 81259
nosy: ods
severity: normal
status: open
title: ElementTree and minidom don't prevent creation of not well-formed XML
type: behavior
versions: Python 2.5, Python 2.6, Python 3.0

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5166>
_______________________________________


More information about the Python-bugs-list mailing list