Dear XLML Users!

I am developing lxml.objectify2 (lxml.o2). Lxml.o2 has tree objectives:

Following the old ways

Imagine the following xml file.

xml_str = '''\

<obj:root xmlns:obj="objectified" xmlns:other="otherNS">
  <obj:c1 a1="A1" a2="A2" other:a3="A3">
    <obj:c2>0</obj:c2>
    <obj:c2>1</obj:c2>
    <obj:c2>2</obj:c2>
  </obj:c1>
  <obj:c1>
    <other:c2>3</other:c2>
    <other:c2>5</other:c2>
    <obj:c2>2</obj:c2>
  </obj:c1>
  <obj:c1>
  <other:c2>42</other:c2>
  </obj:c1>
</obj:root>'''


Please notice that the tags obj:c1 and obj/other:c2 are multiple childs of the same {ns}name.


Here a glance at the data processed by xlml.o (standard lxml.objectfy) from the PyCharm IDE perspective.

https://backend.datenadler.de/kram/bildschirmfoto-vom-2022-03-07-23-02-33.png/image_view_fullscreen

You may notice that there is no multiplicity at all. lxml.o is quite limited and not really pythonic. Therefore any Python-IDE will struggles with a representation of lxml processed data.


Following the new ways

Let's use lxml.objectify2 instead.


from lxml.objectify2 import ObjectifiedElement2

obj2_lookup = ObjectifyElementClassLookup(tree_class=ObjectifiedElement2)

parser = etree.XMLParser()
parser.set_element_class_lookup(obj2_lookup)

node = etree.XML(xml_str, parser=parser)


A look from the PyCharm debugger into the data structure processed by lxml.o2:

https://backend.datenadler.de/kram/bildschirmfoto-vom-2022-03-07-22-34-10.png/image_view_fullscreen

As you can see lxml.o2 handles multiple children with same qtag by assigning an "[index]" to them.


<rant>Yeah, that is nice screenwork, but this will never work in code?</rant>

>>> node.obj_c1[2].obj_c2
[3]

here the call to

node.obj_c1

returns a list. Then python takes over get the desired second element.


<rant>Ok, but this will not work with getattr</rant>

>>> getattr(node, 'obj_c1[0]').obj_c2

[0, 1, 2]

Here lxml.o2 does the selection of the element [0] really fast in c-space.


<rant>OK, and where is the catch</rant>

To implement this functionality we need to ensure that two rules are followed by the user.

1) If there are elements without a namespace, a default namespace has to be defined.

2) Any access to a "tag" has to be done qualified, with the exception of the default namespace.

    node.<namespace>_<name>

    mit default namespace

    node.<name>


If these rules a too much for you, go  back to lxml.objectify and be happy.


<rant>Ah, go away. Where do you find such nice XML</rant>

Mh. I have never seen so simple XML documents like in the lxml.objectify tests in the real world.

But I am aware that lxml.o2 will have to be tested thoroughly.


<rant>You will never convince all the users of lxml to change to lxml.o2</rant>

That is true. But I do not even try. lxml.o2 is an alternative to lxml.o for certain usecases.


You are welcome to rant at me :-)

You are also welcome to help with the development of lxml.o2. This is a spare time job for me.

If you do not have the time to help, you may express your liking of lxml.o2, here.


lxml.o2 lives at

https://github.com/Inqbus/lxml

in the branch

https://github.com/Inqbus/lxml/tree/objectify_prefix


Cheers,

Volker








-- 
=========================================================
   inqbus Scientific Computing    Dr.  Volker Jaenisch
   Hungerbichlweg 3               +49 (8860) 9222 7 92
   86977 Burggen                     https://inqbus.de
=========================================================