[lxml-dev] adding a namespace

I am having some problems adding a new namespace to a parsed document. My goal is to take an input file like this: <html xmlns="http://www.w3.org/1999/xhtml"> <body> <div id="one"><p>first paragraph</p></div> <div id="two"><p>second paragraph</p></div> </body> </html> and turn it into this: <html xmlns="http://www.w3.org/1999/xhtml" xmlns:i18n="http://xml.zope.org/namespaces/i18n"> <body> <div id="one"><p i18n:translate="string1">first paragraph</p></div> <div id="two"><p i18n:translate="string2">second paragraph</p></div> </body> </html> the code is fairly simple, and looks like this (simplified from original): NS="http://xml.zope.org/namespaces/i18n" tree=lxml.etree.parse(input) root=tree.getroot() count=1 if "i18n" not in root.nsmap: root.nsmap["i18n"]=NS for el in root.iter(): if "{%s}translate" % NS in el.attrib: continue if hasText(el): el.attrib["{%s}translate" % NS]="string%d" % count count+=1 print lxml.etree.tostring(tree) However the resulting output looks like this: <html xmlns="http://www.w3.org/1999/xhtml"> <body> <div id="one"><p xmlns:ns0="http://xml.zope.org/namespaces/i18n" ns0:translate="string1">first paragraph</p></div> <div id="two"><p xmlns:ns1="http://xml.zope.org/namespaces/i18n" ns1:translate="string2">second paragraph</p></div> </body> </html> while trying to debug this I noticed something odd: lxml allows you to modify the nsmap for an element, but ignores what you do:
I would expect that to either work, or raise an exception telling me I am trying to do something that is not allowed. The current behaviour feels a bit unpythonic. It is possible to specify your own nsmap when creating elements, but I can not find an API to modify the nsmap for a parsed tree. Is that a missing feature, or is there another way to do this? Wichert.

On Tue, 2010-03-23 at 08:33 +0100, Wichert Akkerman wrote:
You could try something like this: ==================================== from lxml import etree NS="http://xml.zope.org/namespaces/i18n" tree=etree.parse(input) root=tree.getroot() count=1 if "i18n" not in root.nsmap: new_root = etree.Element(root.tag, nsmap=dict(i18n=NS, **root.nsmap)) new_root[:] = root[:] for el in new_root.iter(): if "{%s}translate" % NS in el.attrib: continue if el.text is not None and el.text.strip() != '': el.attrib["{%s}translate" % NS]="string%d" % count count+=1 print etree.tostring(new_root) ==================================== Is that what you had in mind? simon

On 3/23/10 09:41 , Simon Wiles 魏希明 wrote:
Almost! The problem with this approach is that you loose the doctype, since that is serialised as part of tree.docinfo, while you are not only outputting the root and its children. As a workaround I could manually output tree.docinfo.doctype I suppose. Wichert.

Hi, bumping this thread was a good idea, it seems. ;) Wichert Akkerman, 18.03.2010 15:16:
Ok, this won't work as the return value of the nsmap property is a newly created dict. The reason is that it returns a map of all prefixes that are defined in the context of the Element, including all live prefixes defined on its ancestors. I've added a short section to the tutorial that explains this (not on the website yet).
You get a plain dict here, so an exception won't work. It would also be unfriendly to return a read-only dict (which would raise an exception on changes) as it's quite reasonable to use the dict in other places of your code.
Simon showed you a way, but apart from that, it's a missing feature. Changing namespace mappings is nothing that the ElementTree API needs to care about, and lxml clearly lacks a good way to do it. Could you file a ticket on the bug tracker? This should be doable for 2.3. Stefan

On 3/23/10 20:09 , Stefan Behnel wrote:
Most certainly: https://bugs.launchpad.net/lxml/+bug/555602 Wichert.

On Tue, 2010-03-23 at 08:33 +0100, Wichert Akkerman wrote:
You could try something like this: ==================================== from lxml import etree NS="http://xml.zope.org/namespaces/i18n" tree=etree.parse(input) root=tree.getroot() count=1 if "i18n" not in root.nsmap: new_root = etree.Element(root.tag, nsmap=dict(i18n=NS, **root.nsmap)) new_root[:] = root[:] for el in new_root.iter(): if "{%s}translate" % NS in el.attrib: continue if el.text is not None and el.text.strip() != '': el.attrib["{%s}translate" % NS]="string%d" % count count+=1 print etree.tostring(new_root) ==================================== Is that what you had in mind? simon

On 3/23/10 09:41 , Simon Wiles 魏希明 wrote:
Almost! The problem with this approach is that you loose the doctype, since that is serialised as part of tree.docinfo, while you are not only outputting the root and its children. As a workaround I could manually output tree.docinfo.doctype I suppose. Wichert.

Hi, bumping this thread was a good idea, it seems. ;) Wichert Akkerman, 18.03.2010 15:16:
Ok, this won't work as the return value of the nsmap property is a newly created dict. The reason is that it returns a map of all prefixes that are defined in the context of the Element, including all live prefixes defined on its ancestors. I've added a short section to the tutorial that explains this (not on the website yet).
You get a plain dict here, so an exception won't work. It would also be unfriendly to return a read-only dict (which would raise an exception on changes) as it's quite reasonable to use the dict in other places of your code.
Simon showed you a way, but apart from that, it's a missing feature. Changing namespace mappings is nothing that the ElementTree API needs to care about, and lxml clearly lacks a good way to do it. Could you file a ticket on the bug tracker? This should be doable for 2.3. Stefan

On 3/23/10 20:09 , Stefan Behnel wrote:
Most certainly: https://bugs.launchpad.net/lxml/+bug/555602 Wichert.
participants (3)
-
Simon Wiles 魏希明
-
Stefan Behnel
-
Wichert Akkerman