Hi,
I would like to implement a DOM wrapper for Libxml2. We heavily use
a Delphi wrapper for Libxml2 at work and for me it would be nice to
have a similar thingy on the Python side - for easy scripting.
I talked with Martijn some time ago about such an attempt and he
directed me to the aready existing initial source for a lxml DOM
wrapper. After having consulted the implementation of pxdom, I
started an initial implementation. Since consider myself still a
Python newbie and was new to Pyrex, I came up with a code more
focused on experimenting with those languages than trying to fit into
the already existing lxml infrasctructure. The main difference
between the lxml approach and this experiment is that it uses an
additional layer of classes to map to Libxml's nodes. With such a
layer, which already cares about the peculiarities of the Libxml2
library, setting up a DOM layer on top felt quite easy. This is all
not new and similar to lxml's etree _NodeBase or _DocumentBase
classes, but with the difference, that instances of such classes
or specialized derived classes would sit on the "_private" field of
a Libxml2 node, rather than a _ProxyRef structure.
This has the following pros:
1. Such accessor classes can hold additional information like the
parent node of Libxml2's namespace nodes (Libxml2 does not have
a field for this), so this information need not be obtained for
every interface, but is stored only once and managed at a distinct
position. Another scenario could be a work-flow where you
modify the node-tree via the DOM API, but want to serialize it
in an ElementTree fashion; so the fact that namespace declarations
need probably to be rearranged, since we used DOM previously could
be saved in the document accessor object.
2. One can design an user interface which is disattached from any
internal lxml classes and therefore their methods. (I dunno if
Python or Pyrex has the concept of "protected" methods, so only
visible to the code of the classes itself. If so, then this
argument does hold.) So internal methods wouldn't interfere with
user interface methods.
3. One can use pure Python as well (at least I hope so, didn't try it
myself :-)) to create new user APIs. So if you need a custom XML
API, then you can use the accessor classes via Python and pick
the functionallity use need. This holds true if you are not
intending to use some fancy XML operations not supported by the
accessor classes. So here the accessors act more like Libxml2
Python bindings than a specific API.
The cons:
1. An additional layer of objects will be slower:
etree:
1 _ProxyRef structure per interface
1 _DocumenBase, _NodeBase, ect per interface object
accessors:
1 accessor object per node
1 _ProxyRef structure per interface object
1 interface object
This is all an experiment for me and I don't have any intention to
modify lxml's current philosophy; so if any one is interested in a
DOM API in the house of lxml: I appretiate comments about the sanity
of such an approach and would change my direction if it does not
fit into lxml.
Note that the DOM API is written in Python in that module. I should
change this due to better performance and write it in Pyrex, but
currently I dunno how to use class constants (is there a way?) with
extension types - needed for the node-types in the "Node" class.
I'll attach the current code (Martijn, it is almost the same I send
you 2-3 weeks ago), together with a modified tree.pyd and the test
script I used for playing with it.
Ah, and it needs Libxml2 2.6.20, due to a new Libxml2 function.
Greetings,
Kasimier