On Wed, 13 May 2015, Stefan Behnel wrote:
Date: Wed, 13 May 2015 17:45:23 +0200 From: Stefan Behnel <stefan_ml@behnel.de> To: lxml mailing list <lxml@lxml.de> Subject: Re: [lxml] cleanup namespaces and XML elements with QNames
Tom Kralidis schrieb am 13.05.2015 um 16:34:
On Wed, 13 May 2015, Stefan Behnel wrote:
I'm getting this as output:
<ogc:Filter xmlns:ogc="http://www.opengis.net/ogc"> <ogc:typeName>gml:Envelope</ogc:typeName> <ogc:typeName>gml:Envelope</ogc:typeName> </ogc:Filter>
So your problem is that "gml:Envelope" is actually text content and not structure, which means that lxml ignores it when removing unused namespace declarations (and the "gml" prefix is really unused, except that some downstream processor wants it to be there).
QName() is only special at the time of assignment. Afterwards, it's turned into regular text content. So there is no way lxml could figure out later that it's something it needs to care about.
Is there value to adding an optional argument to etree.cleanup_namespaces (like preserve_qname_text_content=False or something) which could be implemented? This would then require some sort of register of element text which is QName'd.
I'd rather add an option to prevent a specific sequence/set of prefixes from being removed from the tree. Together with the new option "top_nsmap" in the next version that allows you to move certain declarations to the root element and prevent them from being dropped.
Finding out which prefixes are used in text content is an application specific problem and should be handled on that side. I'd be surprised if this needed more than 5 lines of Python code in your case.
Thanks for the info. Something like the below would do the trick I'm guessing? etree.cleanup_namespaces(root) for xpath in root.xpath('//text()'): if ':' in xpath: prefix, _ = xpath.split(':') if prefix in nsmap: root.nsmap[prefix] = nsmap[prefix] not sure if/how expensive this would be or if there are more efficient approaches? Thanks ..Tom