Re: [lxml] Support integration with other tree changing libxml2 based libraries

24 Jun 2012

      Stefan Behnel <stefan_ml@behnel.de> writes:
...
Dieter Maurer, 23.06.2012 12:02:
...
...
I propose that future `lxml` versions should include a public
`safe_release` function for such purposes.
Maybe a new "removeNodeFromDocument()" API function could first check for
proxies, and then either deallocate or fix up the tree to be stand-alone.
That would be ideal.
...
...
...
Another, but less serious problem: some `libxmlsec` functions
replace a node inside the tree (e.g. a node is replaced by an
`EncryptedData` node representing the node in an encrypted form).
It would be nice if I could "retarget" an `lxml` proxy referencing
the replaced node to point to the replacing node. This way,
`lxml` objects with references to the proxy would see the new
state rather then the confusing picture resulting from the proxy
now refering to an unlinked node.
...
Of course, the "retarget"ing is not trivial. It is not sufficient
to give the proxy a new "_c_node"; its class, too, might need to
be adapted. This were possible as long as the two classes
had the same "C" layout for their objects. Is `lxml` supposed
to support proxy classes with differing "C" layout (I expect "yes"
as answer).
From the POV of lxml the proxy is just a reference to an object of type (or
subtype of) _Element. The problem is that the user most likely holds
another reference to it
This means, one cannot replace the proxy object by a new one
but one could change the proxy object content (e.g. set a new "_c_node",
set a new "__class__").

As I understood, "lxml" ensures that there is at most one proxy
for any given "c_node" (by putting a proxy reference into the
"_private" of the "c_node"). Thereby, changing the proxy content
changes all "views" of the "lxml" application on the respectice
"c_node".
...
and there is no way we can exchange the object (or
even its class) that that reference points to. These things are a lot less
trivial at the C level than in Python (and even there they can have
surprising side effects).
I am not sure that I understand your argument (though I fully
appreciate your reluctance to provide a public API).

In my case, I am not inside a complicated `lxml` context where
`lxml` code could hold direct references to internal attributes
of the proxy I want to retarget. The only such references
are in my binding function -- and of course, I must ensure that
they do not get confused.
...
...
For the moment, I will tell the user of my `libxmlsec` binding:
forget any `lxml` reference into an encrypted or decrypted document,
including a reference to its root tree and always rebuild
references from the operation's return value.
Basically, what this means is that Elements that the user holds a reference
to won't change during the transformation but may no longer be at their
original place afterwards.
The worst behaviour I have observed:

doc = parse(StringIO("<?...><Envelope>...</Envelope>"))
  encrypt(..., doc.getroot())
  print tostring(doc)
<Envelope>...</Envelope>

That means that encrypting the root node of an "_ElementTree"
has stripped this tree of its processing instruction and its comment.

I understand why this happens but from a user perspective, it can
be really surprising.
...
Perfectly reasonable if you ask me, because
changing the tree is the whole point of doing that transformation. The same
happens in XInclude, for example. Or even just when you change the tag name
of an Element. None of those cases replaces the implementation of an
Element that the user holds. After all, he or she could still need the
original Element for some reason.
As the example above shows, he neither sees the original nor
the new element.