[XML-SIG] Learning XML processing with Python

Stefan Behnel stefan_ml at behnel.de
Sat Aug 9 15:39:35 CEST 2008


Bob Kline wrote:
> There are
> alternate candidates, but it's not clear which to adopt for future
> projects.  Part of the problem is that none of the packages in the
> standard library have any support for validating XML

That's too simply put. There's ElementTree in the standard library, which can
be replaced by lxml if you need validation (even through a runtime test). It
may not be a 100% drop-in replacement (more like 98%), but it's really easy to
keep that door open when you write your code. A couple of integration tests
can handle that pretty well. Unless, obviously, you fall for the feature set
and start depending on lxml instead of plain ET. :)

> and suggestions
> that such support would be useful (particularly in light of the SIG's
> stated goal to "make Python /the/ premier language for XML processing"
> [1]) are dismissed on the grounds that (1) most programmers don't want
> to validate the XML they get; and (2) there are too many possible
> validation techniques available, so we won't support any in the standard
> library.

Again, too simple. libxml2 supports all important schema languages, and they
are all exposed by lxml through the same simple interface. I don't see why
anyone should *refuse* to have them all when they come for free. And if you
really don't need them, good ol'ET is still your friend.

> There is support for validation in lxml, but having been
> burned once by building software against the predecessor of that
> package, only to have it abandoned, it's easy to understand reluctance
> to convert everything to depend on yet another package which isn't
> officially supported as part of the standard library.

I don't know what you call the "predecessor" of lxml here. In case you meant
PyXML, there is no link between the two at all (except that PyXML is amongst
the reasons why lxml exists). And if you meant ET, I never heard about anyone
being "burned" by such a mature package.

I think it's definitely future safe to write your code against ET today. And
it's also future safe to convert existing code to ElementTree if you want it
to be based on a standard Python XML library (I wouldn't know anything more
"standard" than the standard library). Given that ET is much easier to use
than PyXML ever was, this should not be too hard for somewhat well written
code. It's definitely work, though, and you'll likely end up throwing away a
lot of code. But it will also make you feel better. :)

And given the fact that lxml is ready for Py3 as soon as Cython is (which is
close, it already compiles and runs pretty well), and that both lxml and
Cython are very well maintained and actively developed projects, I don't see
why basing your software on lxml should be any less future proof.


More information about the XML-SIG mailing list