Hi Holger!
Thank
you very much for the fast response.
The reason for this is that obviously {http://www.isotc211.org/2005/gco}CharacterString is not a valid Python identifier and it makes sense to restrict unqualified lookup to children from the same namespace.
I like to disagree on
and it makes sense to restrict unqualified lookup to children from the same namespace
What does the namespace of a node has in common with the namespace of one of its subnodes? Nothing. It is quite common in XML that you borrow from other namespaces.
Other namespace based python libs like for instance RDFlib solve
this problem generically by adding the namespace to the python
property.
{http://www.isotc211.org/2005/gco}CharacterString -> gco_CharacterString This works like a charm. Not once I had a corner-case. The problem lies deeply burrowed in the nature of LXML objectify implementation. Objectify does not really transform the XML into a real python instance hierarchy (as RDFlib does), but directs all attribute access via function calls to the C-libxml core. This is on one hand a desired behavior since one so can change XML on-the-fly and some of the changes are visible as well in the XML as also in the objectified representation. But on the other hand the information what namespace a node belongs to is not persistent in the node and therefore cannot be used for lookup. This can easily be seen in lxml/objectivy.pyx line 414ff: cdef tree.xmlNode* _findFollowingSibling(tree.xmlNode* c_node, const_xmlChar* href, const_xmlChar* name, Py_ssize_t index): cdef tree.xmlNode* (*next)(tree.xmlNode*) if index >= 0: next = cetree.nextElement else: index = -1 - index next = cetree.previousElement while c_node is not NULL: if c_node.type == tree.XML_ELEMENT_NODE and \ _tagMatches(c_node, href, name): index = index - 1 if index < 0: return c_node c_node = next(c_node) return NULL To find the desired sibling the code loops over all childern and matches (parentNamespace, propertyName) against them. The correct operation of _findFollowingSibling should IMHO be:
Make a lookup on all children (with the python property name only). If one match is found then return this match. If none or more than one match is found then no answer is possible.
I extended _findFollowingSibling to
cdef tree.xmlNode* _findFollowingSibling(tree.xmlNode* c_node,
const_xmlChar* href,
const_xmlChar* name,
Py_ssize_t index):
cdef tree.xmlNode* (*next)(tree.xmlNode*)
cdef tree.xmlNode* start_node
cdef tree.xmlNode* result_node
cdef int found = 0
start_node = c_node
if index >= 0:
next = cetree.nextElement
else:
index = -1 - index
next = cetree.previousElement
# search with namespace
while c_node is not NULL:
if c_node.type == tree.XML_ELEMENT_NODE and \
_tagMatches(c_node, href, name):
index = index - 1
if index < 0:
return c_node
c_node = next(c_node)
# search without namespace
c_node = start_node
while c_node is not NULL:
if c_node.type == tree.XML_ELEMENT_NODE and c_node.name ==
name:
index = index - 1
if index < 0:
result_node = c_node
found += 1
c_node = next(c_node)
# check if only one result is found
if found == 1:
return result_node
return NULL
Sorry for my clumsy Cython. But it works perfectly well. I also preserved the notion to look up in the parent namespace first.
>>>
node.fileIdentifier.CharacterString
'4157d397-e2c3-4e6e-8a84-0712aa9c1162'
I would really like if someone may test this https://github.com/Inqbus/lxml Branch better-objectify-attributes proof of concept. When getting positive answers I would come up with a pull request. Cheers, Volker
-- ========================================================= inqbus Scientific Computing Dr. Volker Jaenisch Hungerbichlweg 3 +49 (8860) 9222 7 92 86977 Burggen https://inqbus.de =========================================================