[XML-SIG] Problems with "ignorable whitespace" in python's minidom and pulldom !

Andrew Clover and-xml at doxdesk.com
Sun Mar 14 19:08:16 EST 2004

Arno Wilhelm <quirxi at aon.at> wrote:

> I have found out that you are the author of the python pxdom module.

Yes. Hello!

> How is pxdom compared to the standard dom and minidom implementation shipped 
> with python itself?

I haven't devised any sort of proper DOM benchmark, but timing (a) the 
W3C DOM test suite and (b) the PXTL test suite showed pxdom to be 
similar in speed to PyXML's 4DOM, considerably slower than minidom.

It depends on what kind of operation is being done, of course. YMMV.

> Can it already be used in production environments ?

For the DOM Level 1 and 2 features, yes; these have been pretty static 
for some time. DOM Level 3 was a moving target up until recently so the 
implementation is not so mature, but 1.0 [final] seems fairly stable. 
There is still the chance that the specification for DOM Level 3 might 
change again before it hits final Recommendation, but hopefully not in 
any significant manner.

> How "fast" is it when parsing larger documents ?

Really slow. It's a pure-Python parser with minimal optimisation, quite 
apart from DOM memory footprint issues. pxdom aims for correctness and 
ease of embedding in other projects (without having to worry about 
whether/what-versions-of other XML packages are installed); for speed, 
by design, it's very poor.

 > Does that mean that [1.1] can also load external xml files linked
> to the actual xml document by a kind of url ?

Kind of: the URI is used in an entity declaration, for example:

   <!ENTITY fish SYSTEM "../entities/fish.ent">

Most notably, the DTD external subset ("xhtml1-strict.dtd" etc.) is 
loaded like this in the <!DOCTYPE> declaration.

The feature is only of use if you're dealing with DTD-reliant 
(non-standalone) documents. If you're looking for a general-purpose XML 
inclusion scheme, check out XInclude.


Andrew Clover
mailto:and at doxdesk.com

More information about the XML-SIG mailing list