Mailman 3 [lxml-dev] Memory management redux - lxml - The Python XML Toolkit

Dec. 11, 2004

      Hi there,

Inspired by discussions with Vic and through browsing the vlibxml2 code, 
I've implemented bit of memory management functionality which, after a 
lot of manual debugging, seems to be doing the right thing. So far...

A new addition to the lxml trunk is nodereg, and associated testing 
stuff (noderegtest.pyx and test_nodereg.py). nodereg is a system for 
registering Python-level node proxies, plus some base classes for the 
document and node objects in a typical libxml2 tree wrapper.

The nodereg module functionality can be used to make sure that memory 
(in particular libxml2 tree nodes) gets collected when it is possible, 
and not before. :) This sounds easy, but it is surprisingly tricky.

Next:

* look into hooking in libxml2's memory debugging functionality for 
testing. Investigate Vic's code in that area/get Vic's advice.

* start rewriting etree, dom, or vlibxml2 to use nodereg. This will 
likely further evolve nodereg.

* Add more functionality to nodereg. One thing that currently is not 
handled is attribute nodes, for instance.

* Optimize nodereg. The strategy currently employed requires, in the 
worst case, a lot of full-tree walks to determine whether a node in the 
tree can be successfully garbage collected. We need to come up with some 
smart algorithm/datastructure to avoid this having to happen to often.

Another thing I would really like to do is investigate adding weakref 
support to Pyrex. Right now I had to first jump through a bit of a hoop 
to make it work. Then later on I took a long time debugging an obscure 
case where there would be a remaining refcount on an object even if the 
only object still pointing to the object was a WeakValueDictionary. I 
finally traced it down to Pyrex introducing this. I'm not clear why, but 
somehow the base class got involved (which was not weakreferenceable as 
defined by Pyrex). This somehow managed to trick the object into keeping 
a reference while it shouldn't, causing it never to be deallocated.

Being able to just say 'this class can be weakreferenced' in Pyrex 
should make this go away.

Regards,

Martijn

[lxml-dev] Memory management redux

Martijn Faassen

Martijn Faassen

Martijn Faassen

tags

participants (1)