Variable namespaces, xml and finding all the URIs

I have some XML generated by an idiotic application that embeds version information of the generating software - not semantically relevant to the XML - into it, e.g. <info xmlns="http://vendor/base"> <item xmlns="http://vendor/os/1.2.3/item-info"> This is incredibly irritating to work with if I want to write xpath queries which are vaguely sane and maintainable. I cannot discard the namespace info. I want to extract a namespaces dict like this: { 'base': 'http://vendor/base', 'item-info': 'http://vendor/os/1.2.3/item-info', } I want to hide this from the users of my library - it's aimed at relatively non-expert users - so I want to use an xpath "helper" method, possibly on an Element sub-class, like this: root.xpath_vendor('base:info/item-info:item/...') As far as I can tell, my only option is to walk the *entire* document and build the namespace map. The actual application is a stream of small documents, so I'm doing this thousands of times a second. Is there any more efficient way? IIUC, sub-classes of Element cannot maintain any state as they can be deleted and re-instantiated at will, so I can't cache these namespaces at parse time and keep a reference to the cache?

Phil Mayers schrieb am 04.12.2015 um 15:53:
I have some XML generated by an idiotic application that embeds version information of the generating software - not semantically relevant to the XML - into it, e.g.
<info xmlns="http://vendor/base"> <item xmlns="http://vendor/os/1.2.3/item-info">
This is incredibly irritating to work with if I want to write xpath queries which are vaguely sane and maintainable.
I cannot discard the namespace info.
"cannot discard" as in "cannot remove from tree since you still need it afterwards"? Or as in "cannot disregard in XPath expressions since there may be tag name collisions across different namespaces"? Meaning, would a wildcard namespace match like "*[local-name() = 'item']" in XPath or "{*}item" in ElementPath work for you?
I want to extract a namespaces dict like this:
{ 'base': 'http://vendor/base', 'item-info': 'http://vendor/os/1.2.3/item-info', }
I want to hide this from the users of my library - it's aimed at relatively non-expert users - so I want to use an xpath "helper" method, possibly on an Element sub-class, like this:
root.xpath_vendor('base:info/item-info:item/...')
"use" as in "use internally" or "provide to the users of your library"? Difference being the limited internal requirements that you control versus the arbitrarily complex requirements of external users. Stefan
participants (2)
-
Phil Mayers
-
Stefan Behnel