[lxml-dev] lxml - DOM
Hi, I would like to implement a DOM wrapper for Libxml2. We heavily use a Delphi wrapper for Libxml2 at work and for me it would be nice to have a similar thingy on the Python side - for easy scripting. I talked with Martijn some time ago about such an attempt and he directed me to the aready existing initial source for a lxml DOM wrapper. After having consulted the implementation of pxdom, I started an initial implementation. Since consider myself still a Python newbie and was new to Pyrex, I came up with a code more focused on experimenting with those languages than trying to fit into the already existing lxml infrasctructure. The main difference between the lxml approach and this experiment is that it uses an additional layer of classes to map to Libxml's nodes. With such a layer, which already cares about the peculiarities of the Libxml2 library, setting up a DOM layer on top felt quite easy. This is all not new and similar to lxml's etree _NodeBase or _DocumentBase classes, but with the difference, that instances of such classes or specialized derived classes would sit on the "_private" field of a Libxml2 node, rather than a _ProxyRef structure. This has the following pros: 1. Such accessor classes can hold additional information like the parent node of Libxml2's namespace nodes (Libxml2 does not have a field for this), so this information need not be obtained for every interface, but is stored only once and managed at a distinct position. Another scenario could be a work-flow where you modify the node-tree via the DOM API, but want to serialize it in an ElementTree fashion; so the fact that namespace declarations need probably to be rearranged, since we used DOM previously could be saved in the document accessor object. 2. One can design an user interface which is disattached from any internal lxml classes and therefore their methods. (I dunno if Python or Pyrex has the concept of "protected" methods, so only visible to the code of the classes itself. If so, then this argument does hold.) So internal methods wouldn't interfere with user interface methods. 3. One can use pure Python as well (at least I hope so, didn't try it myself :-)) to create new user APIs. So if you need a custom XML API, then you can use the accessor classes via Python and pick the functionallity use need. This holds true if you are not intending to use some fancy XML operations not supported by the accessor classes. So here the accessors act more like Libxml2 Python bindings than a specific API. The cons: 1. An additional layer of objects will be slower: etree: 1 _ProxyRef structure per interface 1 _DocumenBase, _NodeBase, ect per interface object accessors: 1 accessor object per node 1 _ProxyRef structure per interface object 1 interface object This is all an experiment for me and I don't have any intention to modify lxml's current philosophy; so if any one is interested in a DOM API in the house of lxml: I appretiate comments about the sanity of such an approach and would change my direction if it does not fit into lxml. Note that the DOM API is written in Python in that module. I should change this due to better performance and write it in Pyrex, but currently I dunno how to use class constants (is there a way?) with extension types - needed for the node-types in the "Node" class. I'll attach the current code (Martijn, it is almost the same I send you 2-3 weeks ago), together with a modified tree.pyd and the test script I used for playing with it. Ah, and it needs Libxml2 2.6.20, due to a new Libxml2 function. Greetings, Kasimier
participants (2)
-
cazic
-
Paul Boddie