[XML-SIG] DOM: Multiple proxy problem

Andrew M. Kuchling akuchlin@cnri.reston.va.us
Tue, 6 Oct 1998 12:00:33 -0400 (EDT)


I've realized there's a potential problem with the current DOM
implementation, and am wondering how to fix it.

Problem:
========

The basic structure of a DOM document tree is stored as a tree of
_nodeData instances; each instance contains data about that node
(type, value, children, etc.).  Instances don't hold a reference to
their parent, because this causes cycles which make Python's
refcounting GC fail.  (I may eventually change _nodeData instances
into lists, which may reduce the memory usage and improve the speed.
That's irrelevant for the moment.)

A user of the DOM interface will never come into contact with a
_nodeData instance; instead, the user deals with Element, Text,
CDATASection, ... instances, which contain some attributes (including,
most importantly, the parent node) and delegates other attributes to
the _nodeData instance that's being proxied.  This avoids creating
cycles, and the garbage collector and user frolic happily.

However... consider having multiple proxies for the same _nodeData
instance.  Here's a tree:

   Element 'title'
      Text 'chunk of text'
      Text 'another chunk'

Call the two text nodes t1 and t2.  Get two references to t2, by
something like:

ref1 = Element.get_childNodes()[1]
ref2 = Element.get_childNodes()[1]

Now, move t2 somewhere else:
OtherElement.appendChild( ref1 )

This automatically deletes t2 from its parent's list of children, adds
it to the other element, and updates ref1's parent pointer.  ref2's
pointer is now out of date; if you then do
YetAnotherElement.appendChild( ref2 ), things will go funny.

Solution (?)
============

	How do we solve this?  Given a _nodeData instance, there's no
fast way to find its parent; you can start at the root and crawl the
tree, but this isn't a complete solution, because the node may have
been removed from the tree and added to a DocumentFragment.  (Can we
invent a way of quickly finding the parent, without reintroducing
cycles?  This would fix the problem nicely, and in fact probably
remove the need for proxies completely.)

	We can, at least, detect the problem; if appendChild finds
that t2 isn't actually a child of the purported parent node, it can
raise an exception.  I'll implement that sanity check in the meantime,
and may just leave it in; if people encounter that exception
frequently, the problem is worth fixing more carefully.

	I'd really appreciate hearing some suggestions about this.

	BTW, using PyExpat on my Ultra 10, the 279K hamlet.xml file is
converted to a DOM tree in about 7 seconds; about 40K/sec. The DOM
tree seems to be correct, though it's leaving in Text nodes containing
ignorable whitespace; got to fix that...

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
I didn't say it was my fault. I said it was my responsibility. I know the
difference.
    -- Rose Walker, in SANDMAN #60: "The Kindly Ones:4"