[XML-SIG] DOM performance?

Uche Ogbuji uche.ogbuji@fourthought.com
Sat, 08 Jun 2002 10:49:19 -0600

> Hello world,

Hey, Norm.  Great to have you around here.

> I've been meaning to learn Python for ages and I've finally found the
> project that I'm going to use to do so: a little UI thingy with an XML
> back end. Ok. so I grabbed PyXML 0.7.1 and installed it on my Linux
> box where I've got Python 2.1.
> Running a little test program to build a DOM tree (baby steps, to be sure):
> import sys
> from xml.dom.ext.reader import Sax2
> # create Reader object
> reader = Sax2.Reader()
> print "parse it!"
> # parse the document
> doc = reader.fromStream(sys.stdin)
> print "parsed!"
> I'm concerned about the performance. For a small XML document it works
> fine, but for the actual 1.4Mb XML document I need to read for my
> project, performance is abysmal (after several minutes, I gave up).
> Am I doing something obviously wrong?

Yeah.  You're using 4DOM.  Very thorough, but very slow.  You can get much 
better perfoamnce by using a different DOM implementation

Option one: using the software you've alreasdy installed.

Use minidom instead:

import sys
from xml.dom import minidom
doc = minidom.parse(sys.stdin)

Option two: Even faster, but requires additional software.

Install 4Suite 0.12.0.  Preferably a recent CVS snapshot.  Then use cDomlette

import sys
from Ft.Xml.Domlette import NonvalidatingReader
doc = NonvalidatingReader.parseStream(sys.stdin)

Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Track chair, XML/Web Services One (San Jose, Boston): http://www.xmlconference.com/
DAML Reference - http://www.xml.com/pub/a/2002/05/01/damlref.html
The Languages of the Semantic Web - http://www.newarchitectmag.com/documents/s=2453/new1020218556549/index.html
XML, The Model Driven Architecture, and RDF @ XML Europe - http://www.xmleurope.com/2002/kttrack.asp#themodel