[XML-SIG] DOM API

Greg Stein gstein@lyra.org
Sat, 24 Apr 1999 03:41:26 -0700


Paul Prescod wrote:
> ...
> http://www.w3.org/TR/REC-xml
> http://www.w3.org/TR/xml-stylesheet
> http://www.w3.org/TR/NOTE-dcd
> http://www.w3.org/TR/NOTE-ddml
> 
> Well let's put it this way: XML 1.0 uses PIs.

XML 1.0 *defines* PIs. That is very different.

> So does the stylesheet
> binding extension (for CSS and XSL).

This is what I was looking for: the *use* of a PI.

Per my other email (treatise? :-), I think that I've discovered we are
operating within two classes of applications:

* data-oriented use of XML
* layout-oriented use of XML

For the former, I have not seen a case where a PI is necessary. For the
latter: yes, you need a PI for stylesheets. Too bad... you get to use
the DOM :-)

> I don't doubt that namespaces are important but they can easily be viewed
> as an extension of (or layer on top of) the minimal API.

Nope. Namespaces are critical, as Fredrik has pointed out. My endeavors
to use namespaces within the DOM style of programming has also led me to
believe that it isn't a simple extension or layer on top of a minimal
API. Why? Well... if you attempt to post-process the namespace
information, then where do you store it? The client that is doing the
post-processing only receives *proxy* objects. It cannot drop the
information there since those objects are *not* persistent. Instead, the
client has to reach into the internals of the DOM to set (and get!) the
namespace info. Bleck!

> There are four objects there. If we want it to be a tree we need a wrapper
> object that contains them. You could argue that in the lightweight API the
> version and doctype information could disappear but surely we want to
> allow people to figure out what stylesheets are attached to their
> documents!

I maintain that the stylesheets are not applicable to certain classes of
XML processing. So yes, they get punted too.

A simple API of elements and text is more than suitable.

> > NodeType is bogus. It should be absolutely obvious from the context what a
> > Node is. If you have so many objects in your system that you need NodeType
> > to distinguish them, then you are certainly not a light-weight solution.
> 
> XML is a dynamically typed language, like XML. If I have a mix of
> elements, characters and processing instructions then I need some way of
> differentiating them. I don't feel like it is the place of an API to
> decide that XML is a strongly typed language and silently throw away
> important information from the document.

Hello? It *is* the place of the API to define semantics. That is what
APIs do.

I can understand if you don't like this particular semantic, but I feel
your argument is deeply flawed.

> > > Document.DocumentElement (an element node property)
> >
> > If Document has no other properties, then it is totally bogus. Just return
> > the root Element. Why the hell return an object with a single property
> > that refers to another object? Just return that object!
> 
> Document should also have ChildNodes.

Your spec didn't show it. Okay... so it has ChildNodes. How do you get
the root element? Oops. You have to scan for the thing. Painful!

> > If you want light-weight, then GetAttribute is bogus given that the same
> > concept is easily handled via the .Attributes value. Why introduce a
> > method to simply do Element.Attributes.get(foo) ??
> 
> GetAttribute is simpler, more direct and maybe more efficient in some
> cases. It works with simple strings and not attribute objects.

It will *never* be more efficient. Accessing a Python attribute and
doing a map-fetch will always be faster than a method call. Plain and
simple.

(caveat: as I mentioned in prior posts, qp_xml should be using a mapping
rather than a list of objects... dunno what I was thinking)

> > > Element.TagName
> > > Element.PreviousSibling
> > > Element.NextSibing
> >
> > These Sibling things mean one of two things:
> >
> > 1) you have introduced loops in your data structure
> > 2) you have introduced the requirement for the proxy crap that the current
> > DOM is dealing with (the Node vs _nodeData thing).
> >
> > (1) is mildly unacceptable in a light-weight solution (you don't want
> > people to do a quick parse of data, and then require them to follow it up
> > with .close()).
> 
> I don't see this as a big deal.
> 
> This is an efficiency versus simplicity issue. These functions are
> extremely convenient in a lot of situations.

The origin of qp_xml was for efficiency first, simplicity second. I
maintain that qp_xml provides both.

I will agree to disagree that parents and siblings are useful. (IMO,
they are not, and only serve to complicate the system).

> > Case in point: I wrote a first draft davlib.py against the DOM. Damn it
> > was a serious bitch to simply extract the CDATA contents of an element!
> 
> XML is a dynamically typed language. "I've implemented Java and now I'm
> trying to implement Python and I notice that you guys through these
> PyObject things around and they make my life harder. I'm going to dump
> them from my implementation."

Again, back to this "dynamically typed language". That is your point of
view, rather than a statement of fact. I won't attempt to characterize
how you derived that point of view (from the DOM maybe?), but it is NOT
the view that I hold.

XML is a means of representing structured data. That structure takes the
form of elements (with attributes) and contained text. I do not see how
XML is a programming langauge, or that it is dynamically typed. It is
simply a representation in my mind.

And I'll ignore the quote which just seems to be silliness or
flamebait...

> > Moreover, it was also a total bitch to simply say "give me the child
> > elements". Of course, that didn't work since the DOM insisted on returning
> > a list of a mix of CDATA and elements.
> 
> It told you what was in your document.

I also get that from qp_xml with a lot less hassle, so that says to me
that the DOM is introducing needless complexity/hassle for the client.

> If you want to include helper functions to do this stuff then I say fine:
> but if you want to throw away the real structure of the document then I
> don't think that that is appropriate.

Helper functions are simply a mechanism to patch the inherent complexity
introduced by the DOM. It does not need to be so complicated. Python has
excellent mechanisms to hold structured data; qp_xml uses them to
provide excellent benefit (relative to the DOM).

The only "structure" that I toss are PIs and comments. I do not view
those as "structure". The contents (elements, attributes, text) are
retained and can be reconstructed from the structure that qp_xml
returns.

> > IMO, the XML DOM model is a neat theoretical expression of OO modelling of
> > an XML document. For all practical purposes, it is nearly useless. (again:
> > IMO) ... I mean hey: does anybody actually use the DOM to *generate* XML?
> > Screw that -- I use "print". I can't imagine generating XML using the DOM.
> > Complicated and processing intensive.
> 
> I'm not sure what your point is here. I wouldn't use the DOM *or* qp_xml
> to generate XML in most cases. As you point out "print" or "file.write" is
> sufficient in most applications. This has nothing to do with the DOM and
> everything to do with the fact that writing to a file is inherently a
> streaming operation so a tree usually gets in the way.

Most of the DOM's interface is for *building* a DOM structure. It is
conceivable that those APIs only exist as a way to response to parsing
events, but I believe their existence is due to the fact that people
want to build a DOM and then generate the resulting XML. Otherwise, we
could have had two levels of the DOM interface: read-only (with private
construction mechanisms), and read-write (as exemplified by the current
DOM).

I believe that the notion of build/generate via the DOM is bogus. It
seems you agree :-), and that print or file.write is more appropriate.
Fredrik has some utility objects to do it. All fine. The DOM just blows
:-)

> > Sorry to go off here, but the DOM really bugs me. I think it is actually a
> > net-negative for the XML community to deal with the beast. I would love to
> > be educated on the positive benefits for expressing an XML document thru
> > the DOM model.
> 
> I think that the DOM is broken for a completely different set of reasons
> than you do. But the DOM is also hugely popular and more widely
> implemented than many comparable APIs in other domains. I'm told that

I could care less about compatibility. I'm trying to write an
application here. Geez... using your viewpoint: if I wanted
compatibility, then maybe I should use Java or C since everybody else
uses that.

> Microsoft's DOM impelementation is referenced in dozens of their products
> and throughout many upcoming technologies. Despite its flaws, the DOM is
> an unqualified success and some people like it more than XML itself. They
> are building DOM interfaces to non-XML data!

Goody for them. That doesn't help me write my application.

> > Use a mapping. Toss the intermediate object. If you just have name and
> > value, then you don't need separate objects. Present the attributes as a
> > mapping.
> 
> In this case I am hamstrung by DOM compatibility. This is a small price to
> pay as long as we keep the simpler GetAttribute methods. The only reason
> to get the attribute objects is when you want to iterate over all
> attributes which is probably relatively rare.

This is why I say "toss the DOM". Help your client programmers, rather
than be subserviant to the masses distorted view of XML programming :-)

Cheers,
-g

--
Greg Stein, http://www.lyra.org/