[lxml] Support integration with other tree changing libxml2 based libraries

June 23, 2012

      I am working on an integration of `lxml` and `libxmlsec` (the
XML security library) and I have hit an important problem:
`libxmlsec` functions can change the libxml2 document (tree)
and thereby seriously confuse `lxml`.

The major problem is that `libxmlsec` may unlink and release subtrees
leading to a `SIGSEGV` in `lxml` code when it later accesses those subtrees.
Fortunately, `libxmlsec` can be told not to release unlinked
subtrees but leave that to the application. But now, my application
must do that: release the subtree if and only if `lxml` will not do
that at a later time (because it has a reference to some node in the subtree).
Looking at the public `lxml` API, I have not found
such a function. I have come up with the following first version
of an `lxml_safe_release`:

cdef int lxml_safe_release(_Document doc, xmlNode* c_node) except -1:
  # we let `lxml` get rid of the subtree by wrapping *c_node* into a
  #  proxy and then releasing it again.
  if elementFactory(doc, c_node) == NULL: return -1
  return 0

I hope that this will be sufficient to prevent SIGSEGV.
However, I doubt that it is already enough that references into
unlinked subtrees really work correctly. In similar situations,
`lxml` calls `moveNodeToDocument` in order to get namespace references
inside the unlinked subtree self contained. `moveNodeToDocument` is not
public and far to complicated that I would like to include a copy
in my code.

I propose that future `lxml` versions should include a public
`safe_release` function for such purposes.

Another, but less serious problem: some `libxmlsec` functions
replace a node inside the tree (e.g. a node is replaced by an
`EncryptedData` node representing the node in an encrypted form).
It would be nice if I could "retarget" an `lxml` proxy referencing
the replaced node to point to the replacing node. This way,
`lxml` objects with references to the proxy would see the new
state rather then the confusing picture resulting from the proxy
now refering to an unlinked node.

Of course, the "retarget"ing is not trivial. It is not sufficient
to give the proxy a new "_c_node"; its class, too, might need to
be adapted. This were possible as long as the two classes
had the same "C" layout for their objects. Is `lxml` supposed
to support proxy classes with differing "C" layout (I expect "yes"
as answer).

For the moment, I will tell the user of my `libxmlsec` binding:
forget any `lxml` reference into an encrypted or decrypted document,
including a reference to its root tree and always rebuild
references from the operation's return value.

[lxml] Support integration with other tree changing libxml2 based libraries

Dieter Maurer