[XML-SIG] The Zen of DOM

Ludvig Svenonius ludvig.svenonius@excosoft.se
Mon, 10 Apr 2000 10:56:22 +0200

I've written a couple of applications that do what you describe and for me
it is the only alternative when it comes to reading in XML data. The 4DOM
parser has limited support for this sort of thing, I've found. If you use
the 4DOM Sax parser (actually a DOM parser, found in Ft/Dom/Ext/Reader), you
can easily specify that you want to use a custom Document instance. Because
the Document is also a node factory for the resulting DOM, you can subclass
the default Document class and parse the XML file using your own custom
document, with overridden methods for things like creating elements.
Typically, an overridden element factory method (createElement or
createElementNS) checks the namespace URI and tag name and if a match is
found, returns some custom object like you describe, otherwise it simply
calls the same method of its superclass, Document. The custom object
returned by the element factory method should probably inherit from the
Element class, but you could of course override its functionality

Unfortunately, this is not so easy to accomplish using 4DOM, I've found,
because the Sax parser previously mentioned doesn't seem to include
namespace support. There is an alternative parser called Sax2 that includes
NS support, but for some reason I've been unable to figure out, the ability
to specify a custom Document instance in the Sax2 parser has been removed. I
had to hack my way around this problem, resulting in a somewhat unstable
solution (it seems like the semantics of the Sax2 parser are a bit different
under Unix and Win32). I had to access a 'private' member variable inside
the Sax2 parser instance in order to forcibly replace the default Document
instance before parsing. Perhaps someone working on 4DOM can provide some
insight into a better way, or at least an explanation to why the custom
element factory feature has been removed in Sax2.

I have example code, although I cannot access it from where I am right now.
Get back to me privately if you want me to send it.

Ludvig Svenonius
Excosoft AB
lss@excosoft.se / dominio@hem.passagen.se

-----Original Message-----
From: xml-sig-admin@python.org [mailto:xml-sig-admin@python.org]On
Behalf Of Andy Robinson
Sent: Wednesday, April 05, 2000 4:28 PM
To: Xml-Sig@Python. Org
Subject: [XML-SIG] The Zen of DOM

Looking for spirtual guidance about the right way to do things...

I've been slogging my way through the current XML package looking at many
different ways of parsing XML documents into my own Python object models.
The target is currently "pythonPoint Markup Language", a markup for creating
PDF presentation slides in ReportLab; but I'll need to do many similar
parsers in future.

At the moment, I have a Python class hierarchy with things like
Presentation, Slide, Frame, Paragraph, and various primitive shapes to
decorate pages.  I use a parser derived from xmllib which walks through a
document, and I wrote start_slide/end_slide, start_para/end_para handlers
which construct instances of my own objects and build a tree.

It seems to me that one could use Python's extreme flexibility to take a
generic approach to tree-building, and see if there was a class available
corresponding to a particular tag before creating some generic node; if so,
create it, pass it the available attributes, then pass child nodes to an
add() method so it could organize them itself.  Then I could magically end
up with a notation like...
...without having to write lots of new stuff in the parser as well as the
application class hierarchy every time.  Or at least to navigate the tree
using generic node/child notation, but get my own class instances attached
at each point.

To turn this on its head, there must be a generic way to turn a Python class
instance into XML, and unserialize it again later.

Has anyone actually worked on this?  Is there a solution lurking in the
package somewhere?  Or is the preferred approach to get a DOM tree, then
walk through it building my own objects?

Thanks very much,

Andy Robinson

XML-SIG maillist  -  XML-SIG@python.org