lxml - The Python XML Toolkit

Download

lxml@python.org

December 2015

6 participants
11 discussions

Variable namespaces, xml and finding all the URIs
by Phil Mayers 06 Dec '15

06 Dec '15

I have some XML generated by an idiotic application that embeds version information of the generating software - not semantically relevant to the XML - into it, e.g. <info xmlns="http://vendor/base"> <item xmlns="http://vendor/os/1.2.3/item-info"> This is incredibly irritating to work with if I want to write xpath queries which are vaguely sane and maintainable. I cannot discard the namespace info. I want to extract a namespaces dict like this: { 'base': 'http://vendor/base', 'item-info': 'http://vendor/os/1.2.3/item-info', } I want to hide this from the users of my library - it's aimed at relatively non-expert users - so I want to use an xpath "helper" method, possibly on an Element sub-class, like this: root.xpath_vendor('base:info/item-info:item/...') As far as I can tell, my only option is to walk the *entire* document and build the namespace map. The actual application is a stream of small documents, so I'm doing this thousands of times a second. Is there any more efficient way? IIUC, sub-classes of Element cannot maintain any state as they can be deleted and re-instantiated at will, so I can't cache these namespaces at parse time and keep a reference to the cache?

2 1