[XML-SIG] SAX drivers comparison (PyXML 0.54).

Lars Marius Garshol larsga@garshol.priv.no
15 May 2000 17:26:22 +0200


I tried this experiment again, this time on a different Linux box with
Python 1.5.2 and with pyexpat compiled and installed, and also with
the CVS tree first on the PYTHONPATH.

This time I got this result:

[larsga@pc-larsga tmp]$ python fermigier.py content.example.txt 
!!! xml.sax.drivers.drv_sgmlop Error No parsers found
Parser: xml.sax.drivers.drv_pyexpat, time: 0.180707, 2877 bytes written.
!!! xml.sax.drivers.drv_xmltok Error No parsers found
Parser: xml.sax.drivers.drv_xmlproc, time: 0.613989, 2877 bytes written.
!!! xml.sax.drivers.drv_xmltoolkit Error No parsers found
Parser: xml.sax.drivers.drv_xmllib, time: 1.596865, 13025 bytes written.
!!! xml.sax.drivers.drv_xmldc Error No parsers found

xmlproc gives the same result as it did last time, and pyexpat agrees.
Neither agree with Stephane's results.  This time xmllib does not
agree, though, and like Stephane I get a much larger result.

A quick look at the output from xmllib shows that the problem is the
newest version of xmllib (which I didn't use in the previous test),
which does namespace processing, so all the element names come out as
'http://purl.org/dc/elements/1.0/ Title'. This processing can't be
turned off, so there is no cure for it except to use an older version.

With SAX 2.0 this problem will be handled, since there namespace
processing is the default, and namespace-less processing is optional.
When you try to turn it off the xmllib driver will complain, whereas
the ones for pyexpat and xmlproc will allow you to do it.

However, I'm still unable to find any reason for why pyexpat and
xmlproc misbehaves in Stephane's experiment.

--Lars M.