Paul Prescod paul@prescod.net
Tue, 20 Apr 1999 09:47:36 -0500

Greg Stein wrote:
> A general comment about your "subset" -- it is still heavyweight!

It wasn't clear what I was optimizing for: performance or simplicity. They
aren't always the same thing.

> euh... I can definitely state that in the applications that I've been
> working with, that PIs are bogus, but namespaces are absolutely required.
> (that's how my code came to be!)

> I have yet to see a specification related to XML that depends on PIs.
> Until that happens, then I don't see how these are relevant.


Well let's put it this way: XML 1.0 uses PIs. So does the stylesheet
binding extension (for CSS and XSL). 

I don't doubt that namespaces are important but they can easily be viewed
as an extension of (or layer on top of) the minimal API.

> How is a "document" different in your mind, than an element that happens
> to be the root of a tree? I don't understand from your post. IMO, if you
> wnat simple, then just give the user a tree... that's all the dumb XML is
> anyhow.

Consider the "canonical Web-enabled XML document":

<?xml version="1.0"?>
<?xml-stylesheet blah blah blah?>
<!DOCTYPE MYDOC SYSTEM "http://...">

There are four objects there. If we want it to be a tree we need a wrapper
object that contains them. You could argue that in the lightweight API the
version and doctype information could disappear but surely we want to
allow people to figure out what stylesheets are attached to their

> NodeType is bogus. It should be absolutely obvious from the context what a
> Node is. If you have so many objects in your system that you need NodeType
> to distinguish them, then you are certainly not a light-weight solution.

XML is a dynamically typed language, like XML. If I have a mix of
elements, characters and processing instructions then I need some way of
differentiating them. I don't feel like it is the place of an API to
decide that XML is a strongly typed language and silently throw away
important information from the document.

> > Document.DocumentElement (an element node property)
> If Document has no other properties, then it is totally bogus. Just return
> the root Element. Why the hell return an object with a single property
> that refers to another object? Just return that object!

Document should also have ChildNodes.

> If you want light-weight, then GetAttribute is bogus given that the same
> concept is easily handled via the .Attributes value. Why introduce a
> method to simply do Element.Attributes.get(foo) ??

GetAttribute is simpler, more direct and maybe more efficient in some
cases. It works with simple strings and not attribute objects.

> > Element.TagName
> > Element.PreviousSibling
> > Element.NextSibing
> These Sibling things mean one of two things:
> 1) you have introduced loops in your data structure
> 2) you have introduced the requirement for the proxy crap that the current
> DOM is dealing with (the Node vs _nodeData thing).
> (1) is mildly unacceptable in a light-weight solution (you don't want
> people to do a quick parse of data, and then require them to follow it up
> with .close()). 

I don't see this as a big deal.

This is an efficiency versus simplicity issue. These functions are
extremely convenient in a lot of situations.

> Case in point: I wrote a first draft davlib.py against the DOM. Damn it
> was a serious bitch to simply extract the CDATA contents of an element!

XML is a dynamically typed language. "I've implemented Java and now I'm
trying to implement Python and I notice that you guys through these
PyObject things around and they make my life harder. I'm going to dump
them from my implementation." 

> Moreover, it was also a total bitch to simply say "give me the child
> elements". Of course, that didn't work since the DOM insisted on returning
> a list of a mix of CDATA and elements.

It told you what was in your document.

If you want to include helper functions to do this stuff then I say fine:
but if you want to throw away the real structure of the document then I
don't think that that is appropriate.

> IMO, the XML DOM model is a neat theoretical expression of OO modelling of
> an XML document. For all practical purposes, it is nearly useless. (again:
> IMO) ... I mean hey: does anybody actually use the DOM to *generate* XML?
> Screw that -- I use "print". I can't imagine generating XML using the DOM.
> Complicated and processing intensive.

I'm not sure what your point is here. I wouldn't use the DOM *or* qp_xml
to generate XML in most cases. As you point out "print" or "file.write" is
sufficient in most applications. This has nothing to do with the DOM and
everything to do with the fact that writing to a file is inherently a
streaming operation so a tree usually gets in the way.

> Sorry to go off here, but the DOM really bugs me. I think it is actually a
> net-negative for the XML community to deal with the beast. I would love to
> be educated on the positive benefits for expressing an XML document thru
> the DOM model.

I think that the DOM is broken for a completely different set of reasons
than you do. But the DOM is also hugely popular and more widely
implemented than many comparable APIs in other domains. I'm told that
Microsoft's DOM impelementation is referenced in dozens of their products
and throughout many upcoming technologies. Despite its flaws, the DOM is
an unqualified success and some people like it more than XML itself. They
are building DOM interfaces to non-XML data!

> Use a mapping. Toss the intermediate object. If you just have name and
> value, then you don't need separate objects. Present the attributes as a
> mapping.

In this case I am hamstrung by DOM compatibility. This is a small price to
pay as long as we keep the simpler GetAttribute methods. The only reason
to get the attribute objects is when you want to iterate over all
attributes which is probably relatively rare.


 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself

"The Excursion [Sport Utility Vehicle] is so large that it will come
equipped with adjustable pedals to fit smaller drivers and sensor 
devices that warn the driver when he or she is about to back into a
Toyota or some other object." -- Dallas Morning News