[lxml-dev] lxml - DOM
Hi, I would like to implement a DOM wrapper for Libxml2. We heavily use a Delphi wrapper for Libxml2 at work and for me it would be nice to have a similar thingy on the Python side - for easy scripting. I talked with Martijn some time ago about such an attempt and he directed me to the aready existing initial source for a lxml DOM wrapper. After having consulted the implementation of pxdom, I started an initial implementation. Since consider myself still a Python newbie and was new to Pyrex, I came up with a code more focused on experimenting with those languages than trying to fit into the already existing lxml infrasctructure. The main difference between the lxml approach and this experiment is that it uses an additional layer of classes to map to Libxml's nodes. With such a layer, which already cares about the peculiarities of the Libxml2 library, setting up a DOM layer on top felt quite easy. This is all not new and similar to lxml's etree _NodeBase or _DocumentBase classes, but with the difference, that instances of such classes or specialized derived classes would sit on the "_private" field of a Libxml2 node, rather than a _ProxyRef structure. This has the following pros: 1. Such accessor classes can hold additional information like the parent node of Libxml2's namespace nodes (Libxml2 does not have a field for this), so this information need not be obtained for every interface, but is stored only once and managed at a distinct position. Another scenario could be a work-flow where you modify the node-tree via the DOM API, but want to serialize it in an ElementTree fashion; so the fact that namespace declarations need probably to be rearranged, since we used DOM previously could be saved in the document accessor object. 2. One can design an user interface which is disattached from any internal lxml classes and therefore their methods. (I dunno if Python or Pyrex has the concept of "protected" methods, so only visible to the code of the classes itself. If so, then this argument does hold.) So internal methods wouldn't interfere with user interface methods. 3. One can use pure Python as well (at least I hope so, didn't try it myself :-)) to create new user APIs. So if you need a custom XML API, then you can use the accessor classes via Python and pick the functionallity use need. This holds true if you are not intending to use some fancy XML operations not supported by the accessor classes. So here the accessors act more like Libxml2 Python bindings than a specific API. The cons: 1. An additional layer of objects will be slower: etree: 1 _ProxyRef structure per interface 1 _DocumenBase, _NodeBase, ect per interface object accessors: 1 accessor object per node 1 _ProxyRef structure per interface object 1 interface object This is all an experiment for me and I don't have any intention to modify lxml's current philosophy; so if any one is interested in a DOM API in the house of lxml: I appretiate comments about the sanity of such an approach and would change my direction if it does not fit into lxml. Note that the DOM API is written in Python in that module. I should change this due to better performance and write it in Pyrex, but currently I dunno how to use class constants (is there a way?) with extension types - needed for the node-types in the "Node" class. I'll attach the current code (Martijn, it is almost the same I send you 2-3 weeks ago), together with a modified tree.pyd and the test script I used for playing with it. Ah, and it needs Libxml2 2.6.20, due to a new Libxml2 function. Greetings, Kasimier
Hello,
I would like to implement a DOM wrapper for Libxml2. We heavily use a Delphi wrapper for Libxml2 at work and for me it would be nice to have a similar thingy on the Python side - for easy scripting.
I don't know if you're aware of it, but there's a DOM wrapper for libxml2 called libxml2dom which I've been developing on top of the low-level Python bindings included in the libxml2 distribution. You can find details here: http://www.python.org/pypi/libxml2dom I know that this isn't specifically relevant to lxml, and it may well be the case that your wrapper is more comprehensive - my aim was to cover the most important features of the DOM in my work - but it's a pure Python solution which may just about cover your needs. Paul P.S. I'm lurking on this list really to see if there's anything I can learn about libxml2 that can be applied in my own work, but it should perhaps also be said that given the fairly straightforward mapping of low-level Python wrapper functions to libxml2 API functions, there may be some merit in looking at the libxml2dom code if there's any peculiar behaviour in libxml2 that you're wondering about. I remember things being said about serialisation, for example...
participants (2)
-
cazic
-
Paul Boddie