[XML-SIG] DOM code now in CVS tree

A.M. Kuchling akuchlin@cnri.reston.va.us
Sun, 4 Oct 1998 18:25:19 -0400


I have (finally) checked in the DOM code into the anonymous CVS
repository.

Some random notes:
==================

	* I am definitely not claiming completeness.  There are
objects and sections of code that have never been executed yet.  But
it is possible to build a little DOM tree.  Wide string support isn't
in there, and won't be until Python supports it.

        * Attributes in the IDL DOM definition are implemented as
methods, prefixed with get_.  For example, to retrieve the childNodes
attribute, you call .get_childNodes() .

	This is not an optimal solution.  Ideally it would be as
simple as accessing .childNodes, but this faces some problems.  First,
the low-level tree of elements is implemented as a tree of _nodeData
instances, with no circular references.  Element, Text, etc. instances
are then implemented as proxies for the corresponding _nodeData
instance.  This provides two useful features:

	   1) If there are several proxies for a given _nodeData,
	      changes are instantly visible to all of the proxies. 
	   2) As Ken McLeod suggested, this should avoid circular
	      references, and, therefore, memory leaks.  (I haven't
	      actually verified that there are no leaks, though.) 

	However, this also means that just setting up static
childNodes attributes would break property 1.  So you'd have to write
__getattr__/__setattr__ functions.  I'm worried that this would exact
too great a performance penalty.  Thoughts?

        * You can add extended interfaces and still remain compliant
with the Recommendation.  Therefore, feel free to suggest other useful
things.  I've added a few convenient aliases: the DOM spec specifies
get_nodeName() and get_nodeValue(), so I've added get_name() and
get_value().  We can add other alises to fix other annoyances.

	* Not all the stuff in the dom/ subdirectory has been fixed to
match the current DOM interfaces.  walker.py, writer.py, builder.py,
and sax_builder.py have been quickly patched up, but they haven't been
tested much.

	* Following the responses to my query about preserving
compatibility, there's no attempt to preserve compatibility with the
earlier DOM code at all.

	* There's no attempt yet at representing DTD information.
Some degree of this must be done, however, in order to handle default
attribute values.  (If an attribute of an element has a default value,
then you can override its value by assigning to it; if you then delete
that attribute, the default value springs back.)

        * How should we linearise a DOM tree?  Currently the __str__
method sort of does this; however, if the text is '>', should
str(TextNode) return '>' or '>'?  It's probably a better idea to 
have __repr__ convert nodes to a helpful description of the object
(for debugging), and __str__ will return node.get_nodeData() for some
node types such as Element and Comment; other nodes would simply have
__str__ the same as __repr__.  

        A third method could then linearise a node and its
descendants; this would also allow adding some method arguments to
control whether the XML produced is pretty-printed, indentation
parameters, etc.  Or, we can keep this in another class, as it is now;
I'd prefer to have it be available in the core, though.

Where do we go now?
===================

        * I hope that people will grab the latest CVS snapshot, and
try the code out, reporting bugs and non-compliant points.  Bonus
points for providing a patch.

       * In the meantime, I'm going to work on adding docstrings to
the code, implement the few missing things, fix any reported bugs, and
begin work on a DOM test suite (which is going to be a sizable job).

       * Once the DOM code has been shaken down for a while, it'll be
time for a 0.5 release.  If no one tries the DOM code, the 0.5 release
may be delayed for quite a while, so please try it out.

    Oh, yes; some sample code...

from xml.dom import core

doc = core.createDocument()
html = doc.createElement('html')
html.setAttribute('attr', 'value')

head = doc.createElement('head')
title = doc.createElement('title')
title.appendChild( doc.createTextNode("Title goes here") )

body = doc.createElement('body')
body.setAttribute('background', '#ffffff')
comment = doc.createComment("Comment between head and body")

doc.appendChild(html)
html.appendChild(head)
head.appendChild(title)
html.appendChild(body)

print doc

This outputs:

<?xml version="1.0"?>
<!DOCTYPE XXX>   
<html attr='value'><head><title>Title goes here</title></head><body
background='#ffffff'/></html>

(No DTD info yet, remember.)

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
When a man tells you that he got rich through hard work, ask him *whose*?
    -- Don Marquis