I have some XML generated by an idiotic application that embeds version
information of the generating software - not semantically relevant to
the XML - into it, e.g.
<info xmlns="http://vendor/base">
<item xmlns="http://vendor/os/1.2.3/item-info">
This is incredibly irritating to work with if I want to write xpath
queries which are vaguely sane and maintainable.
I cannot discard the namespace info.
I want to extract a namespaces dict like this:
{
'base': 'http://vendor/base',
'item-info': 'http://vendor/os/1.2.3/item-info',
}
I want to hide this from the users of my library - it's aimed at
relatively non-expert users - so I want to use an xpath "helper" method,
possibly on an Element sub-class, like this:
root.xpath_vendor('base:info/item-info:item/...')
As far as I can tell, my only option is to walk the *entire* document
and build the namespace map. The actual application is a stream of small
documents, so I'm doing this thousands of times a second.
Is there any more efficient way? IIUC, sub-classes of Element cannot
maintain any state as they can be deleted and re-instantiated at will,
so I can't cache these namespaces at parse time and keep a reference to
the cache?