Hi Holger! Thank you very much for the fast response. Am 28.02.22 um 08:41 schrieb Holger.Joukl@LBBW.de:
The reason for this is that obviously {http://www.isotc211.org/2005/gco}CharacterString is not a valid Python identifier and it makes sense to restrict unqualified lookup to children from the same namespace.
I like to disagree on
and it makes sense to restrict unqualified lookup to children from the same namespace
What does the namespace of a node has in common with the namespace of one of its subnodes? Nothing. It is quite common in XML that you borrow from other namespaces. Other namespace based python libs like for instance RDFlib solve this problem generically by adding the namespace to the python property. {http://www.isotc211.org/2005/gco}CharacterString -> gco_CharacterString This works like a charm. Not once I had a corner-case. The problem lies deeply burrowed in the nature of LXML objectify implementation. Objectify does not really transform the XML into a real python instance hierarchy (as RDFlib does), but directs all attribute access via function calls to the C-libxml core. This is on one hand a desired behavior since one so can change XML on-the-fly and some of the changes are visible as well in the XML as also in the objectified representation. But on the other hand the information what namespace a node belongs to is not persistent in the node and therefore cannot be used for lookup. This can easily be seen in lxml/objectivy.pyx line 414ff: cdef tree.xmlNode* _findFollowingSibling(tree.xmlNode* c_node, const_xmlChar* href, const_xmlChar* name, Py_ssize_t index): cdef tree.xmlNode* (*next)(tree.xmlNode*) if index >= 0: next = cetree.nextElement else: index = -1 - index next = cetree.previousElement while c_node is not NULL: if c_node.type == tree.XML_ELEMENT_NODE and \ _tagMatches(c_node, href, name): index = index - 1 if index < 0: return c_node c_node = next(c_node) return NULL To find the desired sibling the code loops over all childern and matches (parentNamespace, propertyName) against them. The correct operation of _findFollowingSibling should IMHO be: Make a lookup on all children (with the python property name only). If one match is found then return this match. If none or more than one match is found then no answer is possible. I extended _findFollowingSibling to cdef tree.xmlNode* _findFollowingSibling(tree.xmlNode* c_node, const_xmlChar* href, const_xmlChar* name, Py_ssize_t index): cdef tree.xmlNode* (*next)(tree.xmlNode*) cdef tree.xmlNode* start_node cdef tree.xmlNode* result_node cdef int found = 0 start_node = c_node if index >= 0: next = cetree.nextElement else: index = -1 - index next = cetree.previousElement # search with namespace while c_node is not NULL: if c_node.type == tree.XML_ELEMENT_NODE and \ _tagMatches(c_node, href, name): index = index - 1 if index < 0: return c_node c_node = next(c_node) # search without namespace c_node = start_node while c_node is not NULL: if c_node.type == tree.XML_ELEMENT_NODE and c_node.name == name: index = index - 1 if index < 0: result_node = c_node found += 1 c_node = next(c_node) # check if only one result is found if found == 1: return result_node return NULL Sorry for my clumsy Cython. But it works perfectly well. I also preserved the notion to look up in the parent namespace first.
node.fileIdentifier.CharacterString '4157d397-e2c3-4e6e-8a84-0712aa9c1162'
I would really like if someone may test thishttps://github.com/Inqbus/lxml Branch*better-objectify-attributes <https://github.com/Inqbus/lxml/tree/better-objectify-attributes> *proof of concept. When getting positive answers I would come up with a pull request. Cheers, Volker -- ========================================================= inqbus Scientific Computing Dr. Volker Jaenisch Hungerbichlweg 3 +49 (8860) 9222 7 92 86977 Burggenhttps://inqbus.de =========================================================