Re: [lxml-dev] Object tree API on top of lxml - first candidate for "lxml.elementlib"?

26 Jun 2006

      Hi Stefan,

Stefan Behnel <behnel_ml@gkec.informatik.tu-darmstadt.de> schrieb am
25.06.2006 21:05:16:
...
Hi Holger,
Holger Joukl wrote:
...
I'd like to add an (arguably :-) "even-more-pythonic" API layer on
top of lxml, enabling the dot (.) operator
syntax to navigate through the tree, similar to amara or
gnosis.xml.objectify, plus the possibility to assign simple
Python builtin types transparently.
For my purposes, element.text is regarded as the element data, and
ns-unqualified subelement access is allowed by
simply using the parent ns-prefix, if no qualified name was given, i.e.
getattr(elt, 'foo') ---> returns children of elt with tagname
{<ns-qualification of elt>}foo
getattr(elt, '{myURI}foo') --> returns children of elt with tagname
{myURI}foo
E.g.:
...
...
...
tree
<etree._ElementTree object at 0x403170>
tree.foo
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'etree._ElementTree' object has no attribute 'foo'
tree.getroot().party
[<Element {http://www.fpml.org/2005/FpML-4-2}party at 401660>, <Element
{http://www.fpml.org/2005/FpML-4-2}party at 401690>]
tree.getroot().party[0]
<Element {http://www.fpml.org/2005/FpML-4-2}party at 401660>
tree.getroot().party[0].partyId
<Element {http://www.fpml.org/2005/FpML-4-2}partyId at 401630>
tree.getroot().party[0].partyId.foo = 187873
tree.getroot().party[0].partyId.foo
<Element {http://www.fpml.org/2005/FpML-4-2}foo at 401690>
tree.getroot().party[0].partyId.foo()
187873
etree.tostring(tree.getroot().party[0])
'<party id="PartyA">\n\t\t<partyId>Party
A<foo>187873</foo></partyId>\n\t</party>\n\t'
I think that's an interesting API to have, especially since a lot of
Python
XML libraries support this. I could imagine having a package
"lxml.elementlib"
as a collection of generic Element classes that implement certain
extended APIs.
Before I start writing something like this myself, could you contribute
your
implementation for this purpose? It shouldn't be very complex anyway. If
you
do, please provide it in pure Python. And, if you want be be really
helpful,
you could come up with some test cases similar to what you find in
src/lxml/tests/test_*.py or even some doctests to proof that it
works as expected.
Stefan
I'd be happy to contribute my implementation but currently this is just
evaluation
stadium. Many API issues are still open; e.g.

- implement the special math methods to allow things like rootElt.subElt.a
+ rootElt.subElt.b,
delegating the actual operation to the underlying simple python type?
- for rootElt.subElt.a maybe even just return the simple python value it
contains instead of
the ElementBase-derived object instance a, iff it does not have children
itself?
- how to determine the simple python value form elt.text? I'm thinking of
using a pluggable
"guesser" here that will be set by a module level function and allows the
user to implement
the rules. This guesser will expect an Element and return the "simple
python value of this
element".

...

My motivation:
We want to migrate a python toolkit used for interfacing issues that is
heavily based
on the commercial TIB/Rendezvous messaging middleware. The internal data
format are
structured RvMsg data as provided by the TIB API. lxml would be a (hot!)
candidate
to come up with s.th. more powerful, as e.g. RvMsg does not support element
attributes, plus all the great lxml features like XPath, XSLT,...
However, there are downsides also: The RvMsg data practically can be
used just like a simple python class instance, making use of the simple
python
builtin types. Also, it is very fast.

In short:
If we decide for the lxml way (which is likely) I can come up with all you
mention,
though it will take some time w.r.t testing as this will become production
code
in a banking environment.

About pure python, though: My first tests with naive pyrex code and naive
python
code (practically just copy&paste) show the pyrex version about 3x faster
than
the pure python version, and speed will be an issue.

If we drop lxml and go for another solution (currently unlikely) I can
still give all my
evaluation code to you.

Would that be ok for you?

Holger

Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde,
verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail
sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht
gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht
garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte
den Inhalt der E-Mail als Hardcopy an.

The contents of this  e-mail are confidential. If you are not the named
addressee or if this transmission has been addressed to you in error,
please notify the sender immediately and then delete this e-mail.  Any
unauthorized copying and transmission is forbidden. E-Mail transmission
cannot be guaranteed to be secure. If verification is required, please
request a hard copy version.