Dear XLML Users!
I am developing lxml.objectify2 (lxml.o2). Lxml.o2 has tree objectives:
Imagine
the following xml file.
xml_str = '''\
<obj:root
xmlns:obj="objectified" xmlns:other="otherNS">
<obj:c1 a1="A1" a2="A2" other:a3="A3">
<obj:c2>0</obj:c2>
<obj:c2>1</obj:c2>
<obj:c2>2</obj:c2>
</obj:c1>
<obj:c1>
<other:c2>3</other:c2>
<other:c2>5</other:c2>
<obj:c2>2</obj:c2>
</obj:c1>
<obj:c1>
<other:c2>42</other:c2>
</obj:c1>
</obj:root>'''
Please notice that the tags obj:c1 and obj/other:c2 are multiple
childs of the same {ns}name.
Here a glance at the data processed by xlml.o (standard
lxml.objectfy) from the PyCharm IDE perspective.
https://backend.datenadler.de/kram/bildschirmfoto-vom-2022-03-07-23-02-33.png/image_view_fullscreen
You may notice that there is no multiplicity at all. lxml.o is
quite limited and not really pythonic. Therefore any Python-IDE
will struggles with a representation of lxml processed data.
Following the new ways
Let's
use lxml.objectify2 instead.
from lxml.objectify2 import
ObjectifiedElement2
obj2_lookup =
ObjectifyElementClassLookup(tree_class=ObjectifiedElement2)
parser = etree.XMLParser()
parser.set_element_class_lookup(obj2_lookup)
node = etree.XML(xml_str, parser=parser)
A look from the PyCharm debugger into the data structure processed by lxml.o2:
https://backend.datenadler.de/kram/bildschirmfoto-vom-2022-03-07-22-34-10.png/image_view_fullscreen
As you
can see lxml.o2 handles multiple children with same qtag by
assigning an "[index]" to them.
<rant>Yeah, that is nice screenwork, but this will never work in code?</rant>
>>>
node.obj_c1[2].obj_c2
[3]
here the
call to
node.obj_c1
returns
a list. Then python takes over get the desired
second element.
<rant>Ok, but this will not work with getattr</rant>
>>> getattr(node, 'obj_c1[0]').obj_c2[0, 1, 2]
Here
lxml.o2 does the selection of the element [0] really fast in
c-space.
<rant>OK, and where is the catch</rant>
To implement this functionality we need to ensure that two rules are followed by the user.
1) If there are elements without a namespace, a default namespace has to be defined.
2) Any access to a "tag" has to be done qualified, with the exception of the default namespace.
node.<namespace>_<name>
mit default namespace
node.<name>
If these
rules a too much for you, go back to lxml.objectify and
be happy.
<rant>Ah, go away. Where do you find such nice XML</rant>
Mh. I have never seen so simple XML documents like in the lxml.objectify tests in the real world.
But I am aware that lxml.o2 will have to be tested thoroughly.
<rant>You will never convince all the users of lxml to change to lxml.o2</rant>
That is true. But I do not even try. lxml.o2 is an alternative to lxml.o for certain usecases.
You are
welcome to rant at me :-)
You are
also welcome to help with the development of lxml.o2. This
is a spare time job for me.
If you do
not have the time to help, you may express your liking of
lxml.o2, here.
lxml.o2
lives at
https://github.com/Inqbus/lxml
in the
branch
Cheers,
Volker
-- ========================================================= inqbus Scientific Computing Dr. Volker Jaenisch Hungerbichlweg 3 +49 (8860) 9222 7 92 86977 Burggen https://inqbus.de =========================================================