[lxml-dev] Another (last?) take on proxy deallocation
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Hi all, while Ian Bicking was working on the "lxml.html" trunk, I noticed that the way some of the modules were implemented could crash lxml.etree during garbage collection. I know, Martijn and I have already reimplemented the proxy code a couple of times, each time solving more and more of the encountered problems, but I really hope this is the last time we have to reimplement this. I already disliked the last way I had to rewrite it, as it required an additional document traversal step each time a document is deallocated. While we accept this behaviour for disconnected tree fragments (a trade-off between different overheads), it should not be necessary for the whole document (at least not more often than required by xmlFreeDoc()). But the problem is that Python's cyclic garbage collector gives no guarantees about the order in which the collected objects are cleaned up - and libxml2 requires access to the document node when cleaning up a tree fragment. So it is actually required that we clean up all _Element proxies first and *then* free the xmlDoc. So, _Document must really always be deallocated *after* all its _Element proxies have been garbage collected. I was thinking about a way to do this for a while and experimented with it on a "proxy-deallocation" branch - until I realised that the best way to control the garbage collector was the garbage collection mechanism itself - i.e. reference counting. So, I checked in a small patch (SVN revision 44623 on the trunk) that simply doubles the ref-counts that an _Element holds to its _Document so that we can control when the ref-count to the document is decreased. lxml.etree now does this explicitly in the tp_dealloc function of the _Element class, *after* cleaning up the proxy, so that the ref-count of the _Document never goes down to 0 before the last of its _Element proxies was deallocated. It is then safe to run xmlFreeDoc() on the libxml2 document from _Document.__dealloc__. https://codespeak.net/viewvc/?view=rev&revision=44623 I really like this approach and I also like that it removes the need for document traversal on _Document deallocation. And: it keeps the code on the lxml.html branch from crashing, which is a *really* good sign. :] So, I will also merge this into the 1.3 branch and release a 1.3.1 soon. We already had a couple of small fixes on the branch, so a bug-fix release next week should nicely improve the code quality of the official release. Have fun, Stefan
participants (1)
-
Stefan Behnel