Stefan Behnel wrote:
Martijn Faassen wrote:
The risk exists in the current setup that an algorithm is used somewhere inside libxml2 that takes the parent, and that we break the expectations of this with the current approach.
I think it should be possible to create a _fakeRootDoc variety that actually changes the parent pointer temporarily as well, returning the original parent so we can put it back when we destroy the root doc. As these two are actually part of an atomic operation I think this should work, correct? That would at least eliminate concerns about possible algorithms somewhere in libxml2 using the parent pointer.
The _fakeRootDoc/_destroyFakeDoc pair is atomic in the sense that they wrap calls to libxml2/libxslt within a single API call.
AFAICT, all actions that I wrapped with _fakeRootDoc do not modify the original document, so this is about pure read access. So, as long as they do not try to access the parent of any of the children of the new root node (that is the only problematic part), everything works just fine and completely as one would expect. If they do, however, they will *read* the parts of the original document that are not part of the ElementTree. This should not pose any fatal problem either, although this may result in unexpected behaviour if you assumed an ElementTree to build an independent document. If we do not give people this idea, even that case should work as expected.
True. I guess if we look at ElementTree, the philosophy exists that (parts of) a tree can be shared between elementtree. This will be visible when they're mutated. The problem with lxml is that lxml somewhere *does* have a notion of a document. Nodes can move around and disappear from one place, appear in another, can move between documents, etc. How much do we want to hide this notion from the user? We cannot fully do it, as it'll pop up in some places.
That said, since the calls actually are atomic, it should be possible to store the original parent link somewhere, replace the parent links and restore them afterwards. You could even leave the node where it is and store *its* sibling links and parent link, and then NULL them in place during the operation. Or copy the node and then copy it back afterwards.
You wouldn't have to provide a variety of functions, though, the modifications should work everywhere. Which approach would you prefer?
I'm in favor to have the parent link retained and restored in a local variable always. This because this is least likely to lead to problems with libxml2, which we'd otherwise be handing inconsistent trees. This may work *now* for what we've tested, but it may not work always. I still need to think about the implications of hiding the concept of the libxml2 document from users though. I hope my thoughts about this rather fundamental change don't frustrate you. I also feel I need to work this through before we adopt the other patches, as it changes so much. To add to the frustration, I'll be away on another trip the first half of next week and won't be able to work on lxml. I will continue to think this issue over though. Regards, Martijn