[lxml] Re: python lxml.objectify gives no attribute access to gco:CharacterString node

3 Mar 2022

      Hi,

Stefan wrote:
...
Note that the content of the XML file that your code is designed to process did not
change at all. It's just that some entirely unrelated content was added, in a
completely different and unrelated namespace. And it was just externally added
to the input data, or maybe just some tiny portion it, without telling you or your
code about it. Especially in places with optional content, where different
namespaces are already a little more common than elsewhere, this is fairly likely
to go unnoticed.
I find this kind of behaviour dangerous enough to restrict the "magic" in the API to
what is easy to understand and predict.
Any magic namespace prefix-based lookup scheme can be dangerous in a similar vein IMHO:
E.g.
...
...
...
root = objectify.fromstring("""
... <a:root xmlns:a="A" xmlns:b="B">
...   <a:x>1</a:x>
...   <b:x>2</b:x>
...   <x>3</x>
... </a:root>""")
root.b_x  # fictitious ns-prefix-based lookup
2
If you now change one XML doc namespace prefix from xmls:b to xmlns:ns_b:
...
...
...
root = objectify.fromstring("""
... <a:root xmlns:a="A" xmlns:ns_b="B">
...   <a:x>1</a:x>
...   <ns_b:x>2</ns_b:x>
...   <x>3</x>
... </a:root>""")
root.b_x  # fictitious ns-prefix-based lookup
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "src/lxml/objectify.pyx", line 231, in lxml.objectify.ObjectifiedElement.__getattr__
  File "src/lxml/objectify.pyx", line 450, in lxml.objectify._lookupChildOrRaise
AttributeError: no such child: b_x
Again, the very same code would suddenly cease to work, while the XML document
remains semantically identical. You'd get an exception in the best case, or
silently ignore data in the worst case.

That aside:

Volker wrote:
...
[...]
Debugging becomes a great hassle if you are not able e.g. in your
PyCharm IDE to navigate the XML tree your parser a currently processing.
Even worse if some nodes do not seem to even exist.
[...]
It is not that I like a more convenient way to address the data. To
address the data I use xpath. It is purely the fact that I cannot use
the objectified data in a debugger while debugging, that drives me mad.
I admit I don’t fully understand the issue (I don't use PyCharm and don't know how
it presents objects in debugging). To me, it seems easy enough to just do s.th. like
...
...
...
list(root.iterchildren())
[1, 2, 3]
or
...
...
...
print(objectify.dump(root))  # see also objectify.enable_recursive_str()
{A}root = None [ObjectifiedElement]
    {A}x = 1 [IntElement]
    {B}x = 2 [IntElement]
    x = 3 [IntElement]
Does PyCharm use elem.__dict__ or dir(elem) to present an object's attributes
in debugging?
Then maybe a way to address OP's issue might be to populate elem.__dict__ not only with
element children from the same namespace but with all children while *still*
only attribute-lookup children from elem's namespace.

I.e. instead of
...
...
...
root = objectify.fromstring("""
... <a:root xmlns:a="A">
...   <a:x>1</a:x>
...   <x>3</x>
... </a:root>""")
root.__dict__
{'x': 1}
__dict__ would yield
...
...
...
root.__dict__  # not how it works today!
{'{A}x': 1, '{}x': 3}
...making all children appear in e.g. dir(), keeping existing getattr behavior:
...
...
...
root.a
1
Maybe this would lessen the "child visibility issue" in debugging?

A breaking change of course, making __dict__ usage more surprising and arguably more
"non-standard" compared to regular Python objects IMO, since they'd contain names
that are not valid Python identifiers.

A cursory glance over the implementation looks like this should be possible in theory.
But I'm rather not convinced we should do this.

Maybe the debugger/IDE can just be taught to give more helpful output?
All the information is there in the first place...

Holger

Landesbank Baden-Wuerttemberg
Anstalt des oeffentlichen Rechts
Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz
HRA 12704
Amtsgericht Stuttgart
HRA 4356, HRA 104 440
Amtsgericht Mannheim
HRA 40687
Amtsgericht Mainz

Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen Daten.
Informationen finden Sie unter https://www.lbbw.de/datenschutz.

[lxml] Re: python lxml.objectify gives no attribute access to gco:CharacterString node

Holger.Joukl＠LBBW.de