DOM to SAX using generators?

Alan Kennedy alanmk at hotmail.com
Mon Oct 14 11:26:41 EDT 2002


Hi all,

I'm writing some tree (XML) processing stuff at the moment, and working
with a variety of SAX processors to process chains of events.

One of the things that I do with the trees is to generate SAX events, by
doing a recursive descent on the tree. This is very straightforward, and
easily implemented.

However, I've been itching to make use of generators, and at first this
appeared to be a classic case for using generators. DOM trees and
generators seem like a very natural fit, and I see that just today Uche
Ogbuji announced his article on exactly this topic.

http://www-106.ibm.com/developerworks/xml/library/x-tipgenr.html

However, when producing SAX events from some types of DOM node, e.g.
elements, documents, etc, that contain child nodes, one needs to produce
a "start" event, produce events for all of the child nodes, and then
produce an "end" event.

But when I try to come up with an algorithm to do that using generators,
I have trouble. I can have a generator which delivers the nodes in
document order, like so

def genNodes(node):
    yield node
    if node.children:
        for child in node.children:
            yield genNodes(child)

And then I can have a "genEvents" method, like so

def genEvents(node, consumer):
    if node.nodeType == DOCUMENT_NODE:
        consumer.startDocument()
        # What goes here?
        consumer.endDocument()
    elif node.nodeType == ELEMENT_NODE:
        # etc, etc

And then lastly I use a simple for loop to iterate over the nodes.

for n in genNodes(tree):
    genEvents(n, consumer)

How can I make this work? The obvious way that I can see to generate the
events for the child nodes (in the section marked "What goes here") is
to recursively visit the children. But that's the wrong approach, since
I'll end up processing the children twice: once through the recursive
descent and once through the series of nodes returned by the generator.

One method that oocurs to me is that I keep a stack of nodes that have
been visited so far, and generate the end events as they are popped off
the stack (when?). However, this involves creating a data structure that
decreases the efficiency of the algorithm.

How can I make generators to do this in a way that is more efficient
than the simple recursive descent approach?

Actually, thinking further about it, another possibility suggests
itself: have the generator yield nodes twice, like so:

START = 1
END = 2

def genNodes(node):
    yield START, node
    if node.children:
        for child in node.children:
            yield genNodes(child)
    yield END, node

But that doesn't seem entirely satisfactory either. It appears to double
the number of generator calls, thus making it less efficient.

Regards,
-- 
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan:              http://xhaus.com/mailto/alan



More information about the Python-list mailing list