Hi Stefan, Stefan Behnel <behnel_ml@gkec.informatik.tu-darmstadt.de> schrieb am 25.06.2006 21:05:16:
Hi Holger,
Holger Joukl wrote:
I'd like to add an (arguably :-) "even-more-pythonic" API layer on top of lxml, enabling the dot (.) operator syntax to navigate through the tree, similar to amara or gnosis.xml.objectify, plus the possibility to assign simple Python builtin types transparently.
For my purposes, element.text is regarded as the element data, and ns-unqualified subelement access is allowed by simply using the parent ns-prefix, if no qualified name was given, i.e. getattr(elt, 'foo') ---> returns children of elt with tagname {<ns-qualification of elt>}foo getattr(elt, '{myURI}foo') --> returns children of elt with tagname {myURI}foo
E.g.:
tree <etree._ElementTree object at 0x403170> tree.foo Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'etree._ElementTree' object has no attribute 'foo' tree.getroot().party [<Element {http://www.fpml.org/2005/FpML-4-2}party at 401660>, <Element {http://www.fpml.org/2005/FpML-4-2}party at 401690>] tree.getroot().party[0] <Element {http://www.fpml.org/2005/FpML-4-2}party at 401660> tree.getroot().party[0].partyId <Element {http://www.fpml.org/2005/FpML-4-2}partyId at 401630> tree.getroot().party[0].partyId.foo = 187873 tree.getroot().party[0].partyId.foo <Element {http://www.fpml.org/2005/FpML-4-2}foo at 401690> tree.getroot().party[0].partyId.foo() 187873 etree.tostring(tree.getroot().party[0]) '<party id="PartyA">\n\t\t<partyId>Party A<foo>187873</foo></partyId>\n\t</party>\n\t'
I think that's an interesting API to have, especially since a lot of Python XML libraries support this. I could imagine having a package "lxml.elementlib" as a collection of generic Element classes that implement certain extended APIs.
Before I start writing something like this myself, could you contribute your implementation for this purpose? It shouldn't be very complex anyway. If you do, please provide it in pure Python. And, if you want be be really helpful, you could come up with some test cases similar to what you find in src/lxml/tests/test_*.py or even some doctests to proof that it works as expected.
Stefan
I'd be happy to contribute my implementation but currently this is just evaluation stadium. Many API issues are still open; e.g. - implement the special math methods to allow things like rootElt.subElt.a + rootElt.subElt.b, delegating the actual operation to the underlying simple python type? - for rootElt.subElt.a maybe even just return the simple python value it contains instead of the ElementBase-derived object instance a, iff it does not have children itself? - how to determine the simple python value form elt.text? I'm thinking of using a pluggable "guesser" here that will be set by a module level function and allows the user to implement the rules. This guesser will expect an Element and return the "simple python value of this element". ... My motivation: We want to migrate a python toolkit used for interfacing issues that is heavily based on the commercial TIB/Rendezvous messaging middleware. The internal data format are structured RvMsg data as provided by the TIB API. lxml would be a (hot!) candidate to come up with s.th. more powerful, as e.g. RvMsg does not support element attributes, plus all the great lxml features like XPath, XSLT,... However, there are downsides also: The RvMsg data practically can be used just like a simple python class instance, making use of the simple python builtin types. Also, it is very fast. In short: If we decide for the lxml way (which is likely) I can come up with all you mention, though it will take some time w.r.t testing as this will become production code in a banking environment. About pure python, though: My first tests with naive pyrex code and naive python code (practically just copy&paste) show the pyrex version about 3x faster than the pure python version, and speed will be an issue. If we drop lxml and go for another solution (currently unlikely) I can still give all my evaluation code to you. Would that be ok for you? Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.