[XML-SIG] DOM performance?
Sat, 08 Jun 2002 10:49:19 -0600
> Hello world,
Hey, Norm. Great to have you around here.
> I've been meaning to learn Python for ages and I've finally found the
> project that I'm going to use to do so: a little UI thingy with an XML
> back end. Ok. so I grabbed PyXML 0.7.1 and installed it on my Linux
> box where I've got Python 2.1.
> Running a little test program to build a DOM tree (baby steps, to be sure):
> import sys
> from xml.dom.ext.reader import Sax2
> # create Reader object
> reader = Sax2.Reader()
> print "parse it!"
> # parse the document
> doc = reader.fromStream(sys.stdin)
> print "parsed!"
> I'm concerned about the performance. For a small XML document it works
> fine, but for the actual 1.4Mb XML document I need to read for my
> project, performance is abysmal (after several minutes, I gave up).
> Am I doing something obviously wrong?
Yeah. You're using 4DOM. Very thorough, but very slow. You can get much
better perfoamnce by using a different DOM implementation
Option one: using the software you've alreasdy installed.
Use minidom instead:
from xml.dom import minidom
doc = minidom.parse(sys.stdin)
Option two: Even faster, but requires additional software.
Install 4Suite 0.12.0. Preferably a recent CVS snapshot. Then use cDomlette
from Ft.Xml.Domlette import NonvalidatingReader
doc = NonvalidatingReader.parseStream(sys.stdin)
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Track chair, XML/Web Services One (San Jose, Boston): http://www.xmlconference.com/
DAML Reference - http://www.xml.com/pub/a/2002/05/01/damlref.html
The Languages of the Semantic Web - http://www.newarchitectmag.com/documents/s=2453/new1020218556549/index.html
XML, The Model Driven Architecture, and RDF @ XML Europe - http://www.xmleurope.com/2002/kttrack.asp#themodel