[lxml-dev] Strange behaviour with namespaces

Hi! I'd like to serialize a part of an XML document for later retrieval. Since the elements are defined using namespaces, I created an ElementTree instance for the element to be serialized and called its write_c14n method.
from import etree import cStringIO XML=""" ... <n1:a xmlns:a="http://a.org" xmlns:b="http://b.org" xmlns:c="http://c.org"> ... <n2:b> ... <n3:c/> ... </n2:b> ... </n1:a>""" e = etree.fromstring(XML) et = etree.ElementTree(e[0]) sb = cStringIO.StringIO() et.write_c14n(sb) sb.getvalue() '<b xmlns:a="http://a.org" xmlns:b="http://b.org" xmlns:c="http://c.org">\n <c></c>\n </b>'
The xmlns information is transported correctly, but the information about the namespaces for the elements is lost. I assume this is a bug. Is there another way to extract a part of the document in textual form such that namespace information is preserved? Using tostring does not work, since this method throws away the xmlns attributes altogether. Thanks & best regards, Albert Brandl

Hi, -------- Original-Nachricht -------- Datum: Mon, 23 Jul 2007 13:22:49 +0200 Von: Albert Brandl <albert.brandl@tttech.com>
from import etree import cStringIO XML=""" ... <n1:a xmlns:a="http://a.org" xmlns:b="http://b.org" xmlns:c="http://c.org"> ... <n2:b> ... <n3:c/> ... </n2:b> ... </n1:a>""" e = etree.fromstring(XML) et = etree.ElementTree(e[0]) sb = cStringIO.StringIO() et.write_c14n(sb) sb.getvalue() '<b xmlns:a="http://a.org" xmlns:b="http://b.org" xmlns:c="http://c.org">\n <c></c>\n </b>'
The xmlns information is transported correctly, but the information about the namespaces for the elements is lost. I assume this is a bug.
You got your prefixes confused: The ns-prefixes of the elements do not correspond to any xmlns-declaration (n1:a vs xmlns:a="...") Try this with consistent prefixes and it works like a charm:
from lxml import etree import cStringIO XML=""" ... <n1:a xmlns:n1="http://a.org" xmlns:n2="http://b.org" ... xmlns:n3="http://c.org"> ... <n2:b> ... <n3:c/> ... </n2:b> ... </n1:a>"""
e = etree.fromstring(XML) et = etree.ElementTree(e[0]) sb = cStringIO.StringIO() et.write_c14n(sb) sb.getvalue() sb.getvalue()'<n1:a xmlns:n1="http://a.org" xmlns:n2="http://b.org" xmlns:n3="http://c.org"><n2:b><n3:c></n3:c></n2:b></n1:a>'
etree.tostring(e) '<n1:a xmlns:n1="http://a.org" xmlns:n2="http://b.org" xmlns:n3="http://c.org"><n2:b><n3:c/></n2:b></n1:a>'
Is there another way to extract a part of the document in textual form such that namespace information is preserved? Using tostring does not work, since this method throws away the xmlns attributes altogether.
No it doesn't:
etree.tostring(e) '<n1:a xmlns:n1="http://a.org" xmlns:n2="http://b.org" xmlns:n3="http://c.org"><n2:b><n3:c/></n2:b></n1:a>'
Regards, Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

Hi, On Mon, Jul 23, 2007 at 01:51:40PM +0200, jholg@gmx.de wrote:
You got your prefixes confused: The ns-prefixes of the elements do not correspond to any xmlns-declaration (n1:a vs xmlns:a="...")
you are correct. Thanks for the quick reply.
No it doesn't:
etree.tostring(e) '<n1:a xmlns:n1="http://a.org" xmlns:n2="http://b.org" xmlns:n3="http://c.org"><n2:b><n3:c/></n2:b></n1:a>'
This is much better than first wrapping the element in an ElementTree. It seems to have been fixed somewhere between lxml 1.1.2 and the current version:
lxml.etree.LXML_VERSION (1, 1, 2, 0) e = fromstring('<ns1:a xmlns:n1="http://a.org" xmlns:ns2="http://b.org" xmlns:ns3="http://c.org"><ns2:b><ns3:c/></ns2:b></ns1:a>') tostring(e[0]) '<ns2:b><ns3:c/></ns2:b>'
Looks like it's time to upgrade :-) Regards, Albert Brandl
participants (2)
-
Albert Brandl
-
jholg@gmx.de