Hi there,
Inspired by discussions with Vic and through browsing the vlibxml2 code,
I've implemented bit of memory management functionality which, after a
lot of manual debugging, seems to be doing the right thing. So far...
A new addition to the lxml trunk is nodereg, and associated testing
stuff (noderegtest.pyx and test_nodereg.py). nodereg is a system for
registering Python-level node proxies, plus some base classes for the
document and node objects in a typical libxml2 tree wrapper.
…
[View More]The nodereg module functionality can be used to make sure that memory
(in particular libxml2 tree nodes) gets collected when it is possible,
and not before. :) This sounds easy, but it is surprisingly tricky.
Next:
* look into hooking in libxml2's memory debugging functionality for
testing. Investigate Vic's code in that area/get Vic's advice.
* start rewriting etree, dom, or vlibxml2 to use nodereg. This will
likely further evolve nodereg.
* Add more functionality to nodereg. One thing that currently is not
handled is attribute nodes, for instance.
* Optimize nodereg. The strategy currently employed requires, in the
worst case, a lot of full-tree walks to determine whether a node in the
tree can be successfully garbage collected. We need to come up with some
smart algorithm/datastructure to avoid this having to happen to often.
Another thing I would really like to do is investigate adding weakref
support to Pyrex. Right now I had to first jump through a bit of a hoop
to make it work. Then later on I took a long time debugging an obscure
case where there would be a remaining refcount on an object even if the
only object still pointing to the object was a WeakValueDictionary. I
finally traced it down to Pyrex introducing this. I'm not clear why, but
somehow the base class got involved (which was not weakreferenceable as
defined by Pyrex). This somehow managed to trick the object into keeping
a reference while it shouldn't, causing it never to be deallocated.
Being able to just say 'this class can be weakreferenced' in Pyrex
should make this go away.
Regards,
Martijn
[View Less]
Hi there,
Tonight I attempted a package reorganisation myself, integrating
vlibxml2 into lxml from the other direction, leaving most of lxml's
build scripts and testing framework intact.
My attempts are here:
http://codespeak.net/svn/lxml/branch/lxml-reorg/
The build story works ('make' does the trick). The tests are picked up
by the test runner. I noticed that on reorg-20041120 'make' tries to
install in site-packages directly, which is something that doesn't
happen on my branch. I …
[View More]think it's cleaner not to have to install in site
packages just for testing, and the test.py runner takes care of this in
lxml.
Another thing I tried to do conscientiously on my branch is use 'svn
move', so that version control history of various files is not lost
during reorganisation. Did you use that, Victor?
Everything is currently in the lxml directory (except for
vlibxml2_mod.so, but I tried placing it there as well and it's an easy
change to setup.py).
The problem is that I can't get vlibxml2 tests to get anywhere near
passing. When running the tests, I get many, many obscure errors along
these lines:
Exception exceptions.NameError: 'vlibxml2' in
<lxml.vlibxml2_subclasses.xmlNodeSub object at 0x404337cc> ignored
Alternate source organisations result in import errors instead..
I tried rearranging the source every which way, but I think the problem
remains that vlibxml2_mod uses (through wrapfuncs.pxi)
vlibxml2_subclasses.py, where vlibxml2_subclasses.py imports from
vlibxml2_mod to actually do the subclassing. This circular setup is
driving me rather crazy.
I suspect that the original vlibxml2 setup had some magic source code
organisation that somehow avoided this problem. I do not know where the
magic is though; Victor, could you perhaps explain?
Preferably I'd like to break this circular import, as it's obviously
hard to maintain. I apparently can't do it. :) Is there any way to
accomplish this? I understand that we need to subclass for
weak-referencability, but is there any way to do this in .pyx code
directly instead of in Python? Or could we make the weakreffable classes
the root base class, so there's no need for them to inherit from
vlibxml2_mod classes first, and then be used in vlibxml2_mod again?
Please help! :)
Regards,
Martijn
[View Less]