[lxml-dev] Custom element class lookup mechanisms
![](https://secure.gravatar.com/avatar/1833d9a7642a05ddea0d27de72be2e2b.jpg?s=120&d=mm&r=g)
Hi all, as I was working on the C-API anyway (capi branch), I decided to add a little external module with different ways of determining the Python element class for a libxml2 node. The "lxml.elements.classlookup" module currently implements three different ways of doing this: * ElementDefaultClassLookup always uses the default class * ElementNamespaceClassLookup is the default namespace lookup mechanism * AttributeBasedElementClassLookup determines the class by looking up the value of a specific attribute in a dict. It falls back to the default classes. Other ways are of cause possible, so if anyone has an idea what to add, I'm open for suggestions. An example usage is this: from lxml.elements import classlookup classlookup.setElementClassLookup( classlookup.ElementDefaultClassLookup()) It registers the mechanism that always uses the default class for elements, comments and PIs (yes, I implemented that, too). This disables the namespace class lookup and thus speeds up the plain element object creation by up to 10%. Example usage for attribute based lookup: mydict = {'int' : IntElement, 'str' : StrElement} classlookup.setElementClassLookup( classlookup.AttributeBasedElementClassLookup('pytype', mydict)) root = etree.XML('<x><a pytype="int">5</a><b pytype="str">test</b></x>') Internally, the lookup function is registered using the public C-API function "setElementClassLookupFunction()" and must be implemented in Pyrex (or C). It takes an object and the xmlNode* as arguments. The object can be used to keep some status, such as the attribute name and class dict in the AttributeBasedElementClassLookup case. It is registered together with the lookup function, passed as first argument on each call and otherwise ignored by lxml. The return value of the lookup function is a callable Python object (typically a subtype of _Element) that returns an element instance. The C API itself is briefly described here: http://codespeak.net/svn/lxml/branch/capi/doc/capi.txt Hope this is useful, Stefan
![](https://secure.gravatar.com/avatar/aa9139495565583da9d670e9a64effd1.jpg?s=120&d=mm&r=g)
On 7/24/06, Stefan Behnel <behnel_ml@gkec.informatik.tu-darmstadt.de> wrote:
Hi all,
as I was working on the C-API anyway (capi branch), I decided to add a little external module with different ways of determining the Python element class for a libxml2 node. The "lxml.elements.classlookup" module currently implements three different ways of doing this:
* ElementDefaultClassLookup always uses the default class * ElementNamespaceClassLookup is the default namespace lookup mechanism * AttributeBasedElementClassLookup determines the class by looking up the value of a specific attribute in a dict. It falls back to the default classes.
Other ways are of cause possible, so if anyone has an idea what to add, I'm open for suggestions.
How about a way to make this setting per-parser instead of global? --Andy
![](https://secure.gravatar.com/avatar/1833d9a7642a05ddea0d27de72be2e2b.jpg?s=120&d=mm&r=g)
Andrew Lutomirski wrote:
On 7/24/06, *Stefan Behnel* wrote:
Hi all,
as I was working on the C-API anyway (capi branch), I decided to add a little external module with different ways of determining the Python element class for a libxml2 node. The "lxml.elements.classlookup " module currently implements three different ways of doing this:
* ElementDefaultClassLookup always uses the default class * ElementNamespaceClassLookup is the default namespace lookup mechanism * AttributeBasedElementClassLookup determines the class by looking up the value of a specific attribute in a dict. It falls back to the default classes.
Other ways are of cause possible, so if anyone has an idea what to add, I'm open for suggestions.
How about a way to make this setting per-parser instead of global?
Sure, I thought about that, too (although rather at a per-document level). But that would require changing the signature of the lookup function to pass also the document (which, in turn, keeps a reference to its parser). I think that makes sense, so I'll pass the document also. You can then use a weak-dict to map documents (or parsers) to element classes. Stefan
![](https://secure.gravatar.com/avatar/1833d9a7642a05ddea0d27de72be2e2b.jpg?s=120&d=mm&r=g)
Andrew Lutomirski wrote:
On 7/24/06, Stefan Behnel wrote: a little external module with different ways of determining the Python element class for a libxml2 node. The "lxml.elements.classlookup" m,odule currently implements three different ways of doing this: [...] Other ways are of cause possible, so if anyone has an idea what to add, I'm open for suggestions.
How about a way to make this setting per-parser instead of global?
Here is how to do it: http://codespeak.net/svn/lxml/branch/capi/doc/elements.txt Stefan
participants (2)
-
Andrew Lutomirski
-
Stefan Behnel