"drop-in" DOM replacement for minidom?

Thu Aug 14 01:07:48 EDT 2003

Harry George <harry.g.george at boeing.com> wrote in message news:<xqxekzpa45k.fsf at cola2.ca.boeing.com>...
> Paul Miller <paul at fxtech.com> writes:
> 
> Switching to
> SAX was a major improvement in mem usage and thus in parse time.
> 

As an alternative you can easily build a custom, lightweight, Object
Model.  I'm using one designed naively to reflect the set of elements
used in the several XML schemas we use.  I use SAX to parse the
document into our object model and have the convenience of programming
with the nicer (in some ways DOM like) interface.

Basically there is a class Element which (since 2.2) is a child of
list.  By convention it can contain either a unicode string (CDATA) or
another element.  The XML attributes can be either stored as a
dictionary or, as I eventually did, directly as attributes of the
class.  Record the parent element (aka location), add some methods
such as nextSibling() etc and you're on your way.

In our case I've adopted a naive approach, ie there is a separate
class for every type of XML element (which all ultimately derive from
Element).  This suffers from being non-general (ie specific, to the
specific set of schema we use), but it has the advantage that you
don't have to look up what kind of Element you are dealing with and
determine what to do with it, but can use polymorphism nicely. 
Further there is no conceptual difference between a chunk of XML, and
the python object structure (ie Elements within Elements) used to
represent it.

It was because Python was so ideally suited to this kind of thing,
that I originally adopted it.  As an aside I wrote an XLST sheet,
which reads the various xml-schema files (I only write DTDs myself,
relying on converters to generate xsd), and writes out the python stub
code, (ie creates the basic class definition for each element adding
the appropriate attributes etc), saving a lot of boring boilerplate
typing and allows for quick and accurate code updates if new
attributes are added to the schema.

Going about it in this kind of way, you get something of much lighter
weight than DOM, but which does have that nice structural (as opposed
to SAX's event-driven) way of working with XML.