
Hi there, Back from my honeymoon I see that Victor has released vlibxml2. Cool! Thanks Victor! I hope that we can still converge it all into a single package -- vlibxml2 aims to do a fairly low-level straightforward mapping for parts of the libxml2 API, as I understand it, while lxml aims to raises the level of the API. The approaches don't need to exclude each other however and pooling them together could help both. Victor, I'm curious about the reasons why you didn't use the existing lxml infrastructure (svn, mailinglist, etc) to do what you've been doing? Were you under the impression that what you were aiming at is outside the scope of lxml? I think people would prefer there not be two pyrex based wrappers for libxml2. Is there anything we can change with lxml so that you'd be happy to merge the projects? I see that Victor changed his license from GPL to BSD (though the README.txt of the 0.1.177 release still says it's GPL-ed). That's great; it means no matter what happens with lxml/vlibxml2 convergence, lxml can hopefully at least 'borrow' the memory management code from vlibxml2. Thanks again Victor! Regards, Martijn

Actually - I started working on the vlibxml2 stuff prior to the announcement of the lxml project. I was also negotiating an agreement at my workplace so that I can work on the XML library during office hours as long as the license was a BSD license. All of that has been sorted out now - and I've really got no time to setup everything you folks have already done at codespeak. I'd really like to keep vlibxml2 and lxml separate. This is mostly for technical reasons as the libxml2 library has some really quirky behavior in it's API. I'd actually like to rewrite a lot of the vlibxml2 code now that I understand the idioms in libxml2 a little better. I guess you really do need to do it 87 times before you get it right. :) So - in the interest of playing nice with everyone - can I get checkin privs to the lxml SVN repository? vic On 9-Nov-04, at 04:20 AM, Martijn Faassen wrote:

Hey, [Philipp (hi, I'm back!), please read to the bottom for where you come in] Victor Ng wrote:
Actually - I started working on the vlibxml2 stuff prior to the announcement of the lxml project.
Oh, just to make it clear: I announced it back at EuroPython in early june, but it's easy to miss such announcements. The svn and stuff here came after that; the original code is in the Infrae cvs, here: http://cvs.infrae.com/packages/lxml/ Then again, you might've been at it for a while too for all I know. I'm glad I caught you on the pyrex list and we can work together; it's already been beneficial to both of us, I hope.
Cool! Infrae has a 'BSD everything by default' policy, but that's because I co-own the company. :)
All of that has been sorted out now - and I've really got no time to setup everything you folks have already done at codespeak.
Of course that's mostly the work of Holger Krekel, helped by Philipp von Weitershausen; I can't really take much credit for it, just be glad that they're my friends.
So what about a source distribution that contains both libraries (if they're done at all, vlibxml2 is obviously much further in that department)? vlibxml2 contains a lot of very important foundational work concerning memory management that I hope we can get the higher level lxml stuff to use as well after a bit of refactoring. So, what I'm proposing is merging the vlibxml2 into lxml's 'src' directory, but being its own package. What do you think? Of course we'd also need to figure out what to do with the extensions package; I'm not familiar enough with vlibxml2's source layout to know where everything goes. Ideas? If you'd like and can come up with a good name, we could rename the whole 'lxml' distribution into something else that may be more neutral. We could then call it all <foo>.libxml2, <foo>.libxslt, <foo>.dom and <foo>.elementtree. I.e. one top level package to make the namespaces clear, with sub-modules/packages that offer particular functionalities. vlibxml2 would become 'libxml2'. Though perhaps this all promises *too* much API compatibility with the original libxml2/elementtree/etc for us to feel comfortable about?
Sounds very familiar indeed. :) I rewrote parts of lxml a few times already, still haven't gotten it right.
So - in the interest of playing nice with everyone - can I get checkin privs to the lxml SVN repository?
Sure! I'll get Philipp to contact you about it; I'll cc him about it. Hi Philipp! Regards, Martijn

Sounds good to me. Phillip has set me up on SVN at codespeak now. I'll check in my code tonight, make a branch and play around with source layout and maybe we can come to an agreement on how everything should fit together.
Naming is pretty low on my priority list. I don't want to have vlibxml2 renamed to libxml2 mainly because libxml2 will have a superset of features for the foreseeable future. I'm going to take a crack at learning SWIG to see if I can close that gap a little faster though. Maybe close the gap entirely. :) vic

Victor Ng wrote: [snip]
Great!
True: we can always reorganize in the future. The nice thing of having a single outer namespace package is that you're free to name anything in the project whatever you like. I'm doing this with the 'lxml' namespace, basically.
True. Anyway, keep the project namespace package in mind -- it's a pattern one sees in many modern Python projects and does make avoiding namespace clashes more easy.
I've seen SWIG in use for some projects, and while it can work, there's nothing that beats manual control. A predecessor to PyGame, PySDL, used SWIG, but the manually written PyGame is much nicer as it's more Pythonic. The nice thing about Pyrex is that it makes such manual control much easier than using straight C. Then again, the requirements for vlibxml2 may be low-level enough to make SWIG worth looking at. What about things like proper Python unicode strings in the API for instance instead of UTF-8, though? What about sensible Python exceptions handling when things go wrong? SWIG may have some features to deal with this, but it'll get tricky.. Regards, Martijn

Hey, I managed to get my code checked into the trunk line of lxml just now. I just checked in my old vlibxml2 tree under the trunk of lxml's trunk. How do you want to co-ordinate the source tree changes? Is there an automated build process that I should be aware of? I'm on an OSX machine so I don't have access to valgrind - I was wondering if I can see build results that show that memory leaks are not actually happening. What's the intent of having the branch vs the tag directories in the svn repository? Are tags considered releases? A couple notes about my vlibxml2 - there's a bug in the replaceNode method. I'm not sure exactly what's going on but when I actually use the code in my 'real' project at work I get weird XML output. I'll try to investigate tonight to see exactly what's going on. so many questions, so little time... vic

Victor Ng wrote:
That is a standard convention for svn repository layouts. svn only ever compares directory trees. So, effectively, branches are copies of directories, and so are tags, except that tag copies aren't modified after copying while branches are. Later, when you want to merge a branch, you let svn compare two directory trees and apply the difference to another one. I strongly suggest reading some svn documentation. I can recommend the svn book (http://svnbook.red-bean.com/). If you already know CVS, there's a chapter in there for those who switch... Philipp

I just didn't recognize the terminology for 'tag'. I've got my own repo setup with branches, releases, and trunk. thanks, vic --- "Consequences, Schmonsequences, as long as I'm rich." -- Looney Tunes, Ali Baba Bunny (1957, Chuck Jones) On 12-Nov-04, at 12:44 AM, Philipp von Weitershausen wrote:

Victor Ng wrote:
I managed to get my code checked into the trunk line of lxml just now. I just checked in my old vlibxml2 tree under the trunk of lxml's trunk.
Great, thank you very much! I hope you don't mind if I babble a bit more about source code layout and naming issues; I'd like to have them worked out. Luckily subversion makes it easy to rename and move things! One issue is that now we have the following layout of the source tree: lxml (distribution directory) src lxml (package) dom (module) etree (module) vlibxml2 (distribution directory) src extensions (package?) victree (this looks new. it's empty however :) vlibxml2 (package) What I was suggesting before was a layout more like this: lxml (distribution directory, merge vlibxml2 distribution directory info into this) src lxml (package) dom (module) etree (module) vlibxml2 (package) extensions (package?) this would basically introduce two top level packages, lxml and vlibxml2. This does make vlibxml2 less easily redistributable separately, but we can still fairly easily pull releases out of this that only contain the vlibxml2 stuff if desired, or we can simply name 'lxml' stuff experimental. This was a suggestion to make it easy to merge things together; my ideal structure is in fact this one: lxml (distribution directory) src lxml (namespace package) vlibxml2 (package) dom (module) etree (module) This way, you import anything at all from lxml like this: import lxml.vlibxml2 or like this: from lxml import vlibxml2 everything is clearly namespace prefixed, reducing the risk of name clashes, and making it entirely clear where everything is coming from. This pattern of a nearly empty 'namespace package' with the project name is a common one used by many Python projects, such as Twisted and Zope 3. It also mirrors the approach taken by Java, though namespacing there is a bit more involved. My experience with this approach in Python is that it's a nice and clean approach. It's possible Victor would prefer another 'umbrella' name than lxml, or otherwise feels uncomfortable merging the projects like this. Please note that while I started 'lxml', it's explicitly not intended to be my personal project, however -- this is why I'm very happy Victor is joining in; his contribution is arguably more important than mine to date. On a side note, 'extensions' doesn't look like a Python package, right? It is never importable by itself from Python as far as I understand it. This might indicate we might want to place it somewhere else than under 'src'. Regards, Martijn

Hey, I split off a discussion about source code layout into a separate thread, but I'll answer these ones here: Victor Ng wrote: [snip]
How do you want to co-ordinate the source tree changes? Is there an automated build process that I should be aware of?
We don't have an automated build process. I think we should be able to merge your setup.py with the one in lxml already. Mine does it best to shut up gcc about all kinds of warnings in Pyrex generated code. This is not very portable beyond gcc; we need to work out eventually how to pass along different options with other compilers such as on Windows. The Makefile in lxml I believe I took more or less from the Zope 3 project. Most relevant are 'make' and 'make test'.
No automated build process. The thing I do is: valgrind --tool=memcheck --suppressions=valgrind-python.supp python2.3 test.py if we get vlibxml2 integrated into lxml's 'src' directory the tests should be picked up automatically and I can mail a report to the list. Hm..I thought I saw tests in the vlibxml2 I downloaded before, but I cannot seem to find them now. Anyway, the testrunner, test.py, should pick up any tests called test_, typically placed in a 'tests' subpackage to the one being tested. This is an example: http://codespeak.net/svn/lxml/trunk/src/lxml/tests/ Regards, Martijn

On 12-Nov-04, at 04:07 AM, Martijn Faassen wrote:
I can setup an automated build if anyone is interested.
Sorry about that - missed that when I copied files from my directory to codespeak's. You should see it now. I've cleaned up the tests a little so you'll see that the victree code has no tests around it - i had some bug in the tests so I just deleted them all. I'll resurrect them when it's fixed. :)

Victor Ng wrote: [snip]
Would be interesting. I presume it'd try to compile and run the tests, and send some form of report?
We'll need to discuss what to do about victree.py/etree.pyx anyway. You'll note that etree.pyx does have quite a few tests. What about the source code reorganization issue I discussed? Right now we have two distribution directories, which is rather awkward. Regards, Martijn

I wont' be able to touch it until later this week, but I'll just branch the trunk and reorganize everything into one source tree if nobody really minds. There's a couple bugs I need to squash and a good bit of restructuring that I'd like to get done so that I can start using this XML library in my day job. :) As for the automated build, at work, I setup a build process which basically runs distutils, runs the testsuite and then emails the results. Alternately, we could overload the test runner to do some XML/HTML output a'la Cruise Control: http://cruisecontrol.sourceforge.net/. I'd really like to get some XML output generated when a test suite is completed - it would really make it easier to do stuff like a web status board. Anwyay - I'll concentrate getting the new file layout sorted out in a couple days. I'll probably mark the victree stuff as 'deprecated' or something. I'm not too interested in keeping it alive if etree does everything anyway. vic --- "Consequences, Schmonsequences, as long as I'm rich." -- Looney Tunes, Ali Baba Bunny (1957, Chuck Jones) On 13-Nov-04, at 04:04 AM, Martijn Faassen wrote:

Victor Ng wrote:
Excellent! Let me know when you want a review and I'll take a look. I hope I will have a bit of time to work on lxml next week.
I'm very curious to hear your thinking on using vlibxml2's system for memory management as a base for the rest of lxml's pyrex based stuff (etree, the dom). Do you think this would be easy or would a fairly large restructuring still be necessary? I guess I need to sit down and try to get some sample code working.
That sounds fine for starters, after the restructuring of the repository is complete so we have one distribution and one test runner to run all tests.
I'll take a look at Cruise Control.
That's certainly my aim; etree is intended to replicate the ElementTree API as well as possible. One part of victree that interested me was the implementation of ElementTree's path evaluation system on top of xpath -- we could fold that into etree.pyx. Regards, Martijn

On 16-Nov-04, at 01:01 PM, Martijn Faassen wrote:
I think the best way to do things would be to sit lxml on top of vlibxml2. We get a couple benefits this way: - vlibxml2 has a critical 'user' so we can use lxml to drive the features that need to be implemented in vlibxml2. Right now - I'm using vlibxml2 but mainly for reading XML documents and slicing them up with XPath which is a small subset of the functions that lxml will need to - one place to worry about memory management. I hate this memory management stuff. If we can keep all the code garbage collection code in one place, it will make life easier for everyone. I _have_ to have the vlibxml2 replaceNode function stabilized by this weekend. vic

Victor Ng wrote:
Terminology note: lxml - the name of the package we're all working on (if we agree on this). It's a distribution name, and from the Python perspective a namespace package. vlibxml2 - the low-level binding for libxml2, part of lxml. This is the part of lxml that's done first and foundational to the rest of it. We could even call it 'lxml.foundation'. :) Once you're happy with it, we can promote people to use this. etree - the elementtree implementation, part of lxml dom - the DOM implementation, part of lxml Would you be okay with this terminology? Using this terminology, we'd say lxml.etree, lxml.dom, etc should sit upon the foundation offered by lxml.vlibxml2.
Good point.
And this is one of my main concerns, so excellent point. I want this badly.
I _have_ to have the vlibxml2 replaceNode function stabilized by this weekend.
I'm curious to see what etree (for instance) sitting on vlibxml2 would look like. Much of vlibxml2 consists of work to expose the libxml2 API to Python. Another part makes sure the memory management issue is clear. Would etree (for instance) make use of both parts of vlibxml2 (is memory management a lot easier to tackle if vlibxml2's exposed libxml2 API is used), or would just the latter be enough? Regards, Martijn

On 17-Nov-04, at 05:24 AM, Martijn Faassen wrote:
Sounds fine with me. I like vlibxml2 mostly because it's less typing than foundation. :) That and since I'm on OSX - there's already a Foundation package from Objective-C.
The memory management code shouldn't ever have to be used by anyone other than the vlibxml2 package. In retrospect - I really did the whole thing pretty badly, but I've never really done this before so I'll forgive myself - this time. I can't think of a good reason why etree should just have to use the memory management code directly - most of etree's calls to vlibxml2 should be pretty quick. We'll only be adding 2, maybe 3 levels of Python indirection between an etree user and the underlying C implementation anyway - that shouldn't be so bad. Worst case scenario - we find bottlenecks - we profile and fix them. No big deal. vic

Actually - I started working on the vlibxml2 stuff prior to the announcement of the lxml project. I was also negotiating an agreement at my workplace so that I can work on the XML library during office hours as long as the license was a BSD license. All of that has been sorted out now - and I've really got no time to setup everything you folks have already done at codespeak. I'd really like to keep vlibxml2 and lxml separate. This is mostly for technical reasons as the libxml2 library has some really quirky behavior in it's API. I'd actually like to rewrite a lot of the vlibxml2 code now that I understand the idioms in libxml2 a little better. I guess you really do need to do it 87 times before you get it right. :) So - in the interest of playing nice with everyone - can I get checkin privs to the lxml SVN repository? vic On 9-Nov-04, at 04:20 AM, Martijn Faassen wrote:

Hey, [Philipp (hi, I'm back!), please read to the bottom for where you come in] Victor Ng wrote:
Actually - I started working on the vlibxml2 stuff prior to the announcement of the lxml project.
Oh, just to make it clear: I announced it back at EuroPython in early june, but it's easy to miss such announcements. The svn and stuff here came after that; the original code is in the Infrae cvs, here: http://cvs.infrae.com/packages/lxml/ Then again, you might've been at it for a while too for all I know. I'm glad I caught you on the pyrex list and we can work together; it's already been beneficial to both of us, I hope.
Cool! Infrae has a 'BSD everything by default' policy, but that's because I co-own the company. :)
All of that has been sorted out now - and I've really got no time to setup everything you folks have already done at codespeak.
Of course that's mostly the work of Holger Krekel, helped by Philipp von Weitershausen; I can't really take much credit for it, just be glad that they're my friends.
So what about a source distribution that contains both libraries (if they're done at all, vlibxml2 is obviously much further in that department)? vlibxml2 contains a lot of very important foundational work concerning memory management that I hope we can get the higher level lxml stuff to use as well after a bit of refactoring. So, what I'm proposing is merging the vlibxml2 into lxml's 'src' directory, but being its own package. What do you think? Of course we'd also need to figure out what to do with the extensions package; I'm not familiar enough with vlibxml2's source layout to know where everything goes. Ideas? If you'd like and can come up with a good name, we could rename the whole 'lxml' distribution into something else that may be more neutral. We could then call it all <foo>.libxml2, <foo>.libxslt, <foo>.dom and <foo>.elementtree. I.e. one top level package to make the namespaces clear, with sub-modules/packages that offer particular functionalities. vlibxml2 would become 'libxml2'. Though perhaps this all promises *too* much API compatibility with the original libxml2/elementtree/etc for us to feel comfortable about?
Sounds very familiar indeed. :) I rewrote parts of lxml a few times already, still haven't gotten it right.
So - in the interest of playing nice with everyone - can I get checkin privs to the lxml SVN repository?
Sure! I'll get Philipp to contact you about it; I'll cc him about it. Hi Philipp! Regards, Martijn

Sounds good to me. Phillip has set me up on SVN at codespeak now. I'll check in my code tonight, make a branch and play around with source layout and maybe we can come to an agreement on how everything should fit together.
Naming is pretty low on my priority list. I don't want to have vlibxml2 renamed to libxml2 mainly because libxml2 will have a superset of features for the foreseeable future. I'm going to take a crack at learning SWIG to see if I can close that gap a little faster though. Maybe close the gap entirely. :) vic

Victor Ng wrote: [snip]
Great!
True: we can always reorganize in the future. The nice thing of having a single outer namespace package is that you're free to name anything in the project whatever you like. I'm doing this with the 'lxml' namespace, basically.
True. Anyway, keep the project namespace package in mind -- it's a pattern one sees in many modern Python projects and does make avoiding namespace clashes more easy.
I've seen SWIG in use for some projects, and while it can work, there's nothing that beats manual control. A predecessor to PyGame, PySDL, used SWIG, but the manually written PyGame is much nicer as it's more Pythonic. The nice thing about Pyrex is that it makes such manual control much easier than using straight C. Then again, the requirements for vlibxml2 may be low-level enough to make SWIG worth looking at. What about things like proper Python unicode strings in the API for instance instead of UTF-8, though? What about sensible Python exceptions handling when things go wrong? SWIG may have some features to deal with this, but it'll get tricky.. Regards, Martijn

Hey, I managed to get my code checked into the trunk line of lxml just now. I just checked in my old vlibxml2 tree under the trunk of lxml's trunk. How do you want to co-ordinate the source tree changes? Is there an automated build process that I should be aware of? I'm on an OSX machine so I don't have access to valgrind - I was wondering if I can see build results that show that memory leaks are not actually happening. What's the intent of having the branch vs the tag directories in the svn repository? Are tags considered releases? A couple notes about my vlibxml2 - there's a bug in the replaceNode method. I'm not sure exactly what's going on but when I actually use the code in my 'real' project at work I get weird XML output. I'll try to investigate tonight to see exactly what's going on. so many questions, so little time... vic

Victor Ng wrote:
That is a standard convention for svn repository layouts. svn only ever compares directory trees. So, effectively, branches are copies of directories, and so are tags, except that tag copies aren't modified after copying while branches are. Later, when you want to merge a branch, you let svn compare two directory trees and apply the difference to another one. I strongly suggest reading some svn documentation. I can recommend the svn book (http://svnbook.red-bean.com/). If you already know CVS, there's a chapter in there for those who switch... Philipp

I just didn't recognize the terminology for 'tag'. I've got my own repo setup with branches, releases, and trunk. thanks, vic --- "Consequences, Schmonsequences, as long as I'm rich." -- Looney Tunes, Ali Baba Bunny (1957, Chuck Jones) On 12-Nov-04, at 12:44 AM, Philipp von Weitershausen wrote:

Victor Ng wrote:
I managed to get my code checked into the trunk line of lxml just now. I just checked in my old vlibxml2 tree under the trunk of lxml's trunk.
Great, thank you very much! I hope you don't mind if I babble a bit more about source code layout and naming issues; I'd like to have them worked out. Luckily subversion makes it easy to rename and move things! One issue is that now we have the following layout of the source tree: lxml (distribution directory) src lxml (package) dom (module) etree (module) vlibxml2 (distribution directory) src extensions (package?) victree (this looks new. it's empty however :) vlibxml2 (package) What I was suggesting before was a layout more like this: lxml (distribution directory, merge vlibxml2 distribution directory info into this) src lxml (package) dom (module) etree (module) vlibxml2 (package) extensions (package?) this would basically introduce two top level packages, lxml and vlibxml2. This does make vlibxml2 less easily redistributable separately, but we can still fairly easily pull releases out of this that only contain the vlibxml2 stuff if desired, or we can simply name 'lxml' stuff experimental. This was a suggestion to make it easy to merge things together; my ideal structure is in fact this one: lxml (distribution directory) src lxml (namespace package) vlibxml2 (package) dom (module) etree (module) This way, you import anything at all from lxml like this: import lxml.vlibxml2 or like this: from lxml import vlibxml2 everything is clearly namespace prefixed, reducing the risk of name clashes, and making it entirely clear where everything is coming from. This pattern of a nearly empty 'namespace package' with the project name is a common one used by many Python projects, such as Twisted and Zope 3. It also mirrors the approach taken by Java, though namespacing there is a bit more involved. My experience with this approach in Python is that it's a nice and clean approach. It's possible Victor would prefer another 'umbrella' name than lxml, or otherwise feels uncomfortable merging the projects like this. Please note that while I started 'lxml', it's explicitly not intended to be my personal project, however -- this is why I'm very happy Victor is joining in; his contribution is arguably more important than mine to date. On a side note, 'extensions' doesn't look like a Python package, right? It is never importable by itself from Python as far as I understand it. This might indicate we might want to place it somewhere else than under 'src'. Regards, Martijn

Hey, I split off a discussion about source code layout into a separate thread, but I'll answer these ones here: Victor Ng wrote: [snip]
How do you want to co-ordinate the source tree changes? Is there an automated build process that I should be aware of?
We don't have an automated build process. I think we should be able to merge your setup.py with the one in lxml already. Mine does it best to shut up gcc about all kinds of warnings in Pyrex generated code. This is not very portable beyond gcc; we need to work out eventually how to pass along different options with other compilers such as on Windows. The Makefile in lxml I believe I took more or less from the Zope 3 project. Most relevant are 'make' and 'make test'.
No automated build process. The thing I do is: valgrind --tool=memcheck --suppressions=valgrind-python.supp python2.3 test.py if we get vlibxml2 integrated into lxml's 'src' directory the tests should be picked up automatically and I can mail a report to the list. Hm..I thought I saw tests in the vlibxml2 I downloaded before, but I cannot seem to find them now. Anyway, the testrunner, test.py, should pick up any tests called test_, typically placed in a 'tests' subpackage to the one being tested. This is an example: http://codespeak.net/svn/lxml/trunk/src/lxml/tests/ Regards, Martijn

On 12-Nov-04, at 04:07 AM, Martijn Faassen wrote:
I can setup an automated build if anyone is interested.
Sorry about that - missed that when I copied files from my directory to codespeak's. You should see it now. I've cleaned up the tests a little so you'll see that the victree code has no tests around it - i had some bug in the tests so I just deleted them all. I'll resurrect them when it's fixed. :)

Victor Ng wrote: [snip]
Would be interesting. I presume it'd try to compile and run the tests, and send some form of report?
We'll need to discuss what to do about victree.py/etree.pyx anyway. You'll note that etree.pyx does have quite a few tests. What about the source code reorganization issue I discussed? Right now we have two distribution directories, which is rather awkward. Regards, Martijn

I wont' be able to touch it until later this week, but I'll just branch the trunk and reorganize everything into one source tree if nobody really minds. There's a couple bugs I need to squash and a good bit of restructuring that I'd like to get done so that I can start using this XML library in my day job. :) As for the automated build, at work, I setup a build process which basically runs distutils, runs the testsuite and then emails the results. Alternately, we could overload the test runner to do some XML/HTML output a'la Cruise Control: http://cruisecontrol.sourceforge.net/. I'd really like to get some XML output generated when a test suite is completed - it would really make it easier to do stuff like a web status board. Anwyay - I'll concentrate getting the new file layout sorted out in a couple days. I'll probably mark the victree stuff as 'deprecated' or something. I'm not too interested in keeping it alive if etree does everything anyway. vic --- "Consequences, Schmonsequences, as long as I'm rich." -- Looney Tunes, Ali Baba Bunny (1957, Chuck Jones) On 13-Nov-04, at 04:04 AM, Martijn Faassen wrote:

Victor Ng wrote:
Excellent! Let me know when you want a review and I'll take a look. I hope I will have a bit of time to work on lxml next week.
I'm very curious to hear your thinking on using vlibxml2's system for memory management as a base for the rest of lxml's pyrex based stuff (etree, the dom). Do you think this would be easy or would a fairly large restructuring still be necessary? I guess I need to sit down and try to get some sample code working.
That sounds fine for starters, after the restructuring of the repository is complete so we have one distribution and one test runner to run all tests.
I'll take a look at Cruise Control.
That's certainly my aim; etree is intended to replicate the ElementTree API as well as possible. One part of victree that interested me was the implementation of ElementTree's path evaluation system on top of xpath -- we could fold that into etree.pyx. Regards, Martijn

On 16-Nov-04, at 01:01 PM, Martijn Faassen wrote:
I think the best way to do things would be to sit lxml on top of vlibxml2. We get a couple benefits this way: - vlibxml2 has a critical 'user' so we can use lxml to drive the features that need to be implemented in vlibxml2. Right now - I'm using vlibxml2 but mainly for reading XML documents and slicing them up with XPath which is a small subset of the functions that lxml will need to - one place to worry about memory management. I hate this memory management stuff. If we can keep all the code garbage collection code in one place, it will make life easier for everyone. I _have_ to have the vlibxml2 replaceNode function stabilized by this weekend. vic

Victor Ng wrote:
Terminology note: lxml - the name of the package we're all working on (if we agree on this). It's a distribution name, and from the Python perspective a namespace package. vlibxml2 - the low-level binding for libxml2, part of lxml. This is the part of lxml that's done first and foundational to the rest of it. We could even call it 'lxml.foundation'. :) Once you're happy with it, we can promote people to use this. etree - the elementtree implementation, part of lxml dom - the DOM implementation, part of lxml Would you be okay with this terminology? Using this terminology, we'd say lxml.etree, lxml.dom, etc should sit upon the foundation offered by lxml.vlibxml2.
Good point.
And this is one of my main concerns, so excellent point. I want this badly.
I _have_ to have the vlibxml2 replaceNode function stabilized by this weekend.
I'm curious to see what etree (for instance) sitting on vlibxml2 would look like. Much of vlibxml2 consists of work to expose the libxml2 API to Python. Another part makes sure the memory management issue is clear. Would etree (for instance) make use of both parts of vlibxml2 (is memory management a lot easier to tackle if vlibxml2's exposed libxml2 API is used), or would just the latter be enough? Regards, Martijn

On 17-Nov-04, at 05:24 AM, Martijn Faassen wrote:
Sounds fine with me. I like vlibxml2 mostly because it's less typing than foundation. :) That and since I'm on OSX - there's already a Foundation package from Objective-C.
The memory management code shouldn't ever have to be used by anyone other than the vlibxml2 package. In retrospect - I really did the whole thing pretty badly, but I've never really done this before so I'll forgive myself - this time. I can't think of a good reason why etree should just have to use the memory management code directly - most of etree's calls to vlibxml2 should be pretty quick. We'll only be adding 2, maybe 3 levels of Python indirection between an etree user and the underlying C implementation anyway - that shouldn't be so bad. Worst case scenario - we find bottlenecks - we profile and fix them. No big deal. vic
participants (4)
-
Fred Drake
-
Martijn Faassen
-
Philipp von Weitershausen
-
Victor Ng