[lxml] lxml.objectify2: More fun, namespaces, pythonic

7 Mar 2022

      Dear XLML Users!

I am developing lxml.objectify2(lxml.o2). Lxml.o2 has tree objectives:

  * making lxml more pythonic
  * introducing robust namespaced properties
  * making lxml more fun

*Following the old ways**
*

Imagine the following xml file.

xml_str = '''\

<obj:root xmlns:obj="objectified" xmlns:other="otherNS">
   <obj:c1 a1="A1" a2="A2" other:a3="A3">
     <obj:c2>0</obj:c2>
     <obj:c2>1</obj:c2>
     <obj:c2>2</obj:c2>
   </obj:c1>
   <obj:c1>
     <other:c2>3</other:c2>
     <other:c2>5</other:c2>
     <obj:c2>2</obj:c2>
   </obj:c1>
   <obj:c1>
   <other:c2>42</other:c2>
   </obj:c1>
</obj:root>'''

Please notice that the tags obj:c1 and obj/other:c2 are multiple childs 
of the same {ns}name.

Here a glance at the data processed by xlml.o (standard lxml.objectfy) 
from the PyCharm IDE perspective.

https://backend.datenadler.de/kram/bildschirmfoto-vom-2022-03-07-23-02-33.pn...

You may notice that there is no multiplicity at all. lxml.o is quite 
limited and not really pythonic. Therefore any Python-IDE will struggles 
with a representation of lxml processed data.

*Following the new ways*

Let's use lxml.objectify2 instead.

from lxml.objectify2 import ObjectifiedElement2

obj2_lookup = ObjectifyElementClassLookup(tree_class=ObjectifiedElement2)

parser = etree.XMLParser()
parser.set_element_class_lookup(obj2_lookup)

node = etree.XML(xml_str, parser=parser)

A look from the PyCharm debugger into the data structure processed by 
lxml.o2:

https://backend.datenadler.de/kram/bildschirmfoto-vom-2022-03-07-22-34-10.pn...

As you can see lxml.o2 handles multiple children with same qtag by 
assigning an "[index]" to them.

*<rant>Yeah, that is nice screenwork, but this will never work in 
code?**</rant>*
...
...
...
node.obj_c1[2].obj_c2
[3]
here the call to

node.obj_c1

returns a list. Then python takes over get the desired second element.

*<rant>Ok, but this will not work with getattr**</rant>*
...
...
...
getattr(node, 'obj_c1[0]').obj_c2
[0, 1, 2]

Here lxml.o2 does the selection of the element [0] really fast in c-space.

**

*<rant>OK, and where is the catch**</rant>*

To implement this functionality we need to ensure that two rules are 
followed by the user.

1) If there are elements without a namespace, a default namespace has to 
be defined.

2) Any access to a "tag" has to be done qualified, with the exception of 
the default namespace.

node.<namespace>_<name>

     mit default namespace

node.<name>

If these rules a too much for you, go  back to lxml.objectify and be happy.

*<rant>Ah, go away. Where do you find such nice XML**</rant>*

Mh. I have never seen so simple XML documents like in the lxml.objectify 
tests in the real world.

But I am aware that lxml.o2 will have to be tested thoroughly.

*
*

*<rant>You will never convince all the users of lxml to change to 
lxml.o2**</rant>*

That is true. But I do not even try. lxml.o2 is an alternative to lxml.o 
for certain usecases.

You are welcome to rant at me :-)

You are also welcome to help with the development of lxml.o2. This is a 
spare time job for me.

If you do not have the time to help, you may express your liking of 
lxml.o2, here.

lxml.o2 lives at

https://github.com/Inqbus/lxml <https://github.com/Inqbus/lxml>

in the branch

https://github.com/Inqbus/lxml/tree/objectify_prefix

Cheers,

Volker

-- 
=========================================================
    inqbus Scientific Computing    Dr.  Volker Jaenisch
    Hungerbichlweg 3               +49 (8860) 9222 7 92
    86977 Burggenhttps://inqbus.de
=========================================================