[Doc-SIG] How to traverse a document object

Paul Moore gustav@morpheus.demon.co.uk
Mon, 22 Oct 2001 23:55:18 +0100


On Mon, 22 Oct 2001 20:45:18 +0100, Paul Moore
<gustav@morpheus.demon.co.uk> wrote:

>I don't know if I'm missing something stupidly obvious here... I want to
>traverse a document generated by a dps.parsers.restructuredtext.Parser
>instance. I can see no way of doing so short of a manual tree-walk with
>type checks all the way - you know::

Hmm. I dug a bit deeper - I can do a bit better, by switching on the
tagname attribute. But I can't find a robust way of terminating the
recursion. I can't check the "children" attribute (if it's zero, don't
recurse) as text nodes don't have this attribute. OK, getattr(node,
'children', 0) works, but that looks ugly.

The problem seems to be that the _Node class has no useful attributes or
methods to handle tree walks, whereas _Element contains children which
are not themselves _Elements (via _TextElement, which again has no
attributes to let me notice what's going on).

OK, I can check for a tagname of "#text". That seems to work. But
there's no way that this feels natural - it smacks too much of magic
numbers...

Grmph. This feels like it should be a natural application for the
"Visitor" pattern. The following works::

    # "Acceptor methods" - see the Visitor pattern for details

    # This one has to handle children
    def AcceptNode(self, visitor):
        visitor.Visit(self)
        for child in self.children:
            child.Accept(visitor)

    # This one doesn't handle children
    def AcceptText(self, visitor):
        visitor.Visit(self)

    # Install the methods - this feels as if it's
    # being unacceptably chummy with the node classes -
    # particularly by referring to the _Element class,
    # which has a leading underscore
    import dps.nodes
    dps.nodes._Element.Accept =3D AcceptNode
    dps.nodes.Text.Accept =3D AcceptText

    # Define a visitor, which just prints the tagname
    class V:
        def Visit(self, node):
            print node.tagname

    # Walk the document tree
    document.Accept(V())

In fact, this seems fairly nice, except for the fact that [Offtopic]_

a. I'm inserting new methods into the node classes, which is a little
   presumptuous of me.
b. I need to know the class names of the node classes, which seems to
   be too tied to implementation specifics.

Nevertheless, its a pretty good tree walking model. It might be nice to
have this in the node classes, except that there may not be a single
"correct" walk order (a bit like the normal preorder, postorder, inorder
issues).

.. [Offtopic] I would normally indent this list if I was writing
   "plain" (ie, no markup) text. I'm not sure what effect such
   indentation would have on reST. I get the impression I'd get an
   extra "blockquote" element that I didn't want. Is this harmful?
   (I guess that depends on the output formatter, and so the answer
   has to be "possibly"). Can it be avoided? In "plain" text, I
   *really* prefer the look of lists when they are indented.

   Even more offtopic - is this indented enough to be part of the
   footnote? I think the indentation rules need more clarification...
   [I did a test, and it is included...]

Sorry, this has all turned into a bit of a brain-dump. But that's
probably because I'm feeling that I'm having to invent something that I
expected to be part of the basic infrastructure. Is it simply that
no-one's got to the point of needing this implemented yet? It's an
"output" issue, and the lack of output generators suggests that that may
well be the problem.

Thanks for listening,
Paul.