[XML-SIG] DOM API

uche.ogbuji@fourthought.com uche.ogbuji@fourthought.com
Sun, 25 Apr 1999 10:14:31 -0600


> As I said in my other messages, I want minidom to be a of PyDOM and 4DOM
> and hopefully the start of a common API. In that vein, minidom makes some
> decisions and extensions that we should discuss:
> 
> dom = DOMFromString( string, SAXbuilder=None )
> dom = DOMFromURL( URL, SAXbuilder=None )
> dom = DOMFromFile( file, SAXbuilder=None )

As I've mentioned, 4DOM already supports these functions, if under different 
names (which we don't mind normalizing to any names that are generally agreed 
upon).  We do have a few additional parameters, though, which I think are 
essential for strict DOM compliance, which I realize is not a key goal of 
PyDOM and minidom, but they're probably fodder for discussion.

def FromXML(      xmlStr,
                  ownerDocument=None,
                  validate=0,
                  keepAllWS=0,
                  catName=None,
                  SAXHandlerClass=XMLDOMGenerator)

* ownerDocument alows us to set this property for generated nodes.  If None, we
create a new Document node from the factory and add the built nodes to the 
document.  If the ownerDocument _is_ set, the new nodes are not added to the
document, and a DocumentFragment is returned instead.  This behavior 
corresponds to most of the use-cases we determined for building.

* validate is to tell the parser whether or not to validate

* keepAllWS basically tells the SAX handler whether to discard 
ignorable_whitespace.

* catName is for Xcatalog support (xmlproc only).  I don't think this needs be 
considered for a unified DOMFromString

* SAXHandlerClass is our equivalent of your SAXBuilder

> The default SAXBuilder would probably be the PyDOM or minidom builder.
> 
> Minidom uses mixed lower-first for property names. For compatibility with
> PyDOM, properties can be requested through get_ methods. My question is:
> do we really need get_ methods? They don't seem very Pythonish to me. Or
> maybe we can use them as implementation mechanism (_get_) but not expose
> them to the client.
> 
> I prefer the class-specific properties to the weird generic ones: tagName
> to nodeName, value to nodeValue and so forth. Obviously PyDOM and 4DOM
> would implement both but I don't see any reason to support that redundancy
> in minidom.
> 
> I made some namespace extensions because we can't wait forever to do
> namespace support.
> 
> getAttribute( "foo", "http://www.blah.bar" )
> 
> Looks up the obvious attribute.
> 
> element.localName gets the second have of the element type name.
> 
> element.uri gets the URI associated with the prefix.
> 
> element.prefix gets the element's prefix. I don't think that the
> namespaces view that prefixes are irrelevant should obviate the XML 1.0
> view that they are NOT. Even if we accept the namespaces view of the world
> entirely, prefixes are chosen to be mmenonmic so they shouldn't be
> discared by software.
> 
> element.attributes returns an attribute mapping object that I think
> behaves exactly like PyDOMs except for namespace support:
> 
> x.attributes["foo", "http://www.blah.bar"]
> 
> This also works, however:
> 
> x.attributes["bar:foo"] (just as in PyDOM)
> 
> Namespace attributes ARE maintained as attributes. keys(), items() and
> values() should be the same as PyDOM.

We might consider this for Namespace support for 4DOM, although we had been 
planning to wait for W3C to jump, so that we could maintain 
standards-compliance.  Right now 4DOM just treats namespaces entirely 
opaquely, i.e. ignores them.  Maybe there is a way to add your above 
suggestions to DOM.Ext.

> I should unify my Error class with PyDOM's.
> 
> I am considering the following enhancements:
> 
> element.elements: returns a list of element children.

In full DOM, this is trivial using Level 2 iterators.  We'd have no problem 
adding a wrapper function to DOM.Ext, though.

> element.getText: returns a list of deep list of data from the text nodes.
> Do your own string.join to choose an appropriate join character.

I'm not sure how useful this is if we omit the semantics of nested elements.  
I would see more use for a method that simply returns the XML text within an 
element, including nested tags.

> element.getChild("FOO") returns the first child (not descendant) element
> with specified element type name.

I've never had a need for such a method.  I often need all such elements, in 
which case I just use getElementsByTagName.

> element.getChild("FOO", "http://...") does the obvious thing.
> 
> element.getChild( "#PCDATA" ) gets a list of child text nodes.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org