[Python-Dev] Update xml.etree.ElementTree for Python 2.7 and 3.2

Stefan Behnel stefan_ml at behnel.de
Sat Feb 20 10:49:50 CET 2010


Florent Xicluna, 18.02.2010 10:21:
> For this purpose, I grew the test suite from 300 lines to 1800 lines, using both
> the tests from upstream and the tests proposed by Neil Muller on issue #6232.

Just a comment on this. While the new tests may work with ElementTree as
is, there are a couple of problem with them. They become apparent when
running the test suite against lxml.etree.

Some of the tests depend on specifics of the serialiser that may not be
guaranteed, such as the order of attributes or namespace declarations in a
tag, or whitespace before the self-closing tag terminator (" />"). ET 1.3
has a mostly redesigned serialiser, and it may receive another couple of
improvements or changes before it comes out. None of theses features is
really required to hold for anything but the current as-is implementation.

Other tests rely on non-XML being serialised, such as

	'text<subtag>subtext</subtag>'

This is achieved by setting "root.tag" to None - I'm not sure this is a
feature of ElementTree, and I'd be surprised if it was documented anywhere.

Several of the tests also use invalid namespace URIs like "uri" or non
well-formed attribute names like "123". That's bad style at best.

There are also some tests for implementation details, such as the "_cache"
in ElementPath or the parser version (requiring expat), or even this test:

	element.getiterator == element.iter

which doesn't apply to lxml.etree, as its element.getiterator() complies
with the ET 1.2 interface for compatibility, whereas the new element.iter()
obeys the ET 1.3 interface that defines it. Asserting both to be equal
doesn't make much sense in the context of their specification.

Another example is

        check_method(element.findall("*").next)



In lxml.etree, this produces an

    AttributeError: 'list' object has no attribute 'next'



because element.findall() is specified in the official ET documentation to
return "a list or iterator", i.e. not necessarily an iterator but always an
iterable. There is an iterfind() that would do the above, which matches ET 1.3.

So my impression is that many of the tests try to provide guarantees where
they cannot or should not exist, and even where the output is clearly
non-conforming with respect to standards. I don't think it makes sense to
put these into a regression test suite.

That said, I should add that lxml's test suite includes about 250 unit
tests that work with (and adapt to) lxml.etree, ElementTree and
cElementTree, in Py2.3+ and Py3.x, and with ET 1.2 and ET 1.3. Although
certainly not a copy&run replacement, those should be much better suited to
accompany the existing stdlib tests.

Stefan



More information about the Python-Dev mailing list