[Doc-SIG] How to traverse a document object

Paul Moore gustav@morpheus.demon.co.uk
Thu, 25 Oct 2001 22:35:44 +0100

On Wed, 24 Oct 2001 23:46:10 -0400, David Goodger
<goodger@users.sourceforge.net> wrote:

>>> The document tree is meant to be an specific document/DTD/schema
>>> implementation only, not a generic DOM.
>> I'm not sure I understand what you mean here. ...
>> (what do you mean by a "generic DOM"?)
>DOM is a generic XML data structure. It contains an ``Element`` class =
>others), whose instances represent all elements. If you want to store a
>``list`` element, it would be an ``Element`` instance whose ``tagName``
>attribute was set to "list". It's not very useful from an =
>programming point of view; you have to switch on the ``tagName`` =
>instead of using polymorphism.

Got you. That makes complete sense. But why does all my tree walking
stuff (and Tibs') spend its time switching on tagname then? Actually, I
know the answer to this - coding a "proper" object-oriented tree-walk is
hard. It's the sort of thing the Visitor pattern is intended to handle,
but as I pointed out in another message, Visitor relies on the right
infrastructure being in place in the "visited" hierarchy - and designing
that infrastructure is hard.

It's probably significant that most of the classes in the DPS doc tree
are of the form

    class paragraph(_TextElement): pass

There's no polymorphism here. The class name is *only* relevant in
setting the tagname attribute via introspection. In many ways, this tree
isn't really object-oriented at all. Let me come back to this later_.

>It's free, yes, but the cost is too high. It depends on how you want to
>build the data structure, and what you want to do with the data =
>once it's complete. In most XML-processing applications, you parse an
>already-existing XML file to a data structure, for which DOM is a valid
>choice. The reStructuredText parser is *building* a document tree =
>and it's easier and more powerful to say ``node =3D nodes.list()`` than =
it is
>to say ``node =3D minidom.Element("list")``, especially when you can =
>the ``nodes.list`` class with specialized behaviour.

.. _later:

OK, I see the point. I was looking at using the tree, not building it. I
agree that building trees using DOM is verbose and clumsy (I've seen
code for it before). So maybe building using specialised code, then
using asdom() to get a DOM, which can then be processed by standard XML
tools, is a valid approach.

But as Juergen Hermann pointed out, asdom() is only one (trivial)
example of a visitor pattern, so we probably need to factor out the
visitor, and reimplement asdom in terms of it. I'll look at this.

I do wonder about your comment "especially when you can customize
the ``nodes.list`` class with specialized behaviour". Agreed, it's a
valid advantage. But you don't *use* that advantage. The only
significantly polymorphic aspects of nodes.py are the bits in support of
asdom(). [The astext() method works polymorphically, but as I can't see
where I might use this for other than a #text node, so making the
polymorphism moot, I'm discounting this].

This isn't to criticise your design. I'm still trying to get a handle on
it from an output point of view. I had started from the assumption that
the DPS doc tree was pretty much inviolate, and I should work with it as
it stands. It looks like there are probably changes needed to support
output. I'm a bit nervous about fiddling with something that central,

>As for processing the data structure once complete, I haven't done much =
>but I'm sure there will be advantages if it's made up of custom objects.

That's a fairly clear confirmation that you believe that there is a need
to incorporate changes in support of output :-)


[I'll comment on the blockquote stuff separately]