Hi, Stefan wrote:
Note that the content of the XML file that your code is designed to process did not change at all. It's just that some entirely unrelated content was added, in a completely different and unrelated namespace. And it was just externally added to the input data, or maybe just some tiny portion it, without telling you or your code about it. Especially in places with optional content, where different namespaces are already a little more common than elsewhere, this is fairly likely to go unnoticed.
I find this kind of behaviour dangerous enough to restrict the "magic" in the API to what is easy to understand and predict.
Any magic namespace prefix-based lookup scheme can be dangerous in a similar vein IMHO: E.g.
root = objectify.fromstring(""" ... <a:root xmlns:a="A" xmlns:b="B"> ... <a:x>1</a:x> ... <b:x>2</b:x> ... <x>3</x> ... </a:root>""") root.b_x # fictitious ns-prefix-based lookup 2
If you now change one XML doc namespace prefix from xmls:b to xmlns:ns_b:
root = objectify.fromstring(""" ... <a:root xmlns:a="A" xmlns:ns_b="B"> ... <a:x>1</a:x> ... <ns_b:x>2</ns_b:x> ... <x>3</x> ... </a:root>""") root.b_x # fictitious ns-prefix-based lookup Traceback (most recent call last): File "<stdin>", line 1, in <module> File "src/lxml/objectify.pyx", line 231, in lxml.objectify.ObjectifiedElement.__getattr__ File "src/lxml/objectify.pyx", line 450, in lxml.objectify._lookupChildOrRaise AttributeError: no such child: b_x
Again, the very same code would suddenly cease to work, while the XML document remains semantically identical. You'd get an exception in the best case, or silently ignore data in the worst case. That aside: Volker wrote:
[...] Debugging becomes a great hassle if you are not able e.g. in your PyCharm IDE to navigate the XML tree your parser a currently processing. Even worse if some nodes do not seem to even exist. [...] It is not that I like a more convenient way to address the data. To address the data I use xpath. It is purely the fact that I cannot use the objectified data in a debugger while debugging, that drives me mad.
I admit I don’t fully understand the issue (I don't use PyCharm and don't know how it presents objects in debugging). To me, it seems easy enough to just do s.th. like
list(root.iterchildren()) [1, 2, 3]
or
print(objectify.dump(root)) # see also objectify.enable_recursive_str() {A}root = None [ObjectifiedElement] {A}x = 1 [IntElement] {B}x = 2 [IntElement] x = 3 [IntElement]
Does PyCharm use elem.__dict__ or dir(elem) to present an object's attributes in debugging? Then maybe a way to address OP's issue might be to populate elem.__dict__ not only with element children from the same namespace but with all children while *still* only attribute-lookup children from elem's namespace. I.e. instead of
root = objectify.fromstring(""" ... <a:root xmlns:a="A"> ... <a:x>1</a:x> ... <x>3</x> ... </a:root>""")
root.__dict__ {'x': 1}
__dict__ would yield
root.__dict__ # not how it works today! {'{A}x': 1, '{}x': 3}
...making all children appear in e.g. dir(), keeping existing getattr behavior:
root.a 1
Maybe this would lessen the "child visibility issue" in debugging? A breaking change of course, making __dict__ usage more surprising and arguably more "non-standard" compared to regular Python objects IMO, since they'd contain names that are not valid Python identifiers. A cursory glance over the implementation looks like this should be possible in theory. But I'm rather not convinced we should do this. Maybe the debugger/IDE can just be taught to give more helpful output? All the information is there in the first place... Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart HRA 4356, HRA 104 440 Amtsgericht Mannheim HRA 40687 Amtsgericht Mainz Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen Daten. Informationen finden Sie unter https://www.lbbw.de/datenschutz.