[lxml-dev] Adding namespaced elements
It appears as though creating a namespaced element using etree.SubElement(d, 'm:bar', nsmap={'m': ns}) does not work correctly, while using the etree.SubElement(d, '{ns}bar') form does. The attached script hopefully illustrates the problem, and also the way that the serialised XML is incorrect in showing an element which was created with no namespace exactly the same as an element in the default namespace (though it appears the parser picks this up correctly when reparsing the output!?). Am I misunderstanding something fundamental? Note that I tried this with both 0.7 and r16184 of the trunk. Jamie -- Artefact Publishing: http://www.artefact.org.nz/ GnuPG Public Key: http://www.artefact.org.nz/people/jamie.html
Hi there, Jamie Norrish wrote:
It appears as though creating a namespaced element using etree.SubElement(d, 'm:bar', nsmap={'m': ns}) does not work correctly, while using the etree.SubElement(d, '{ns}bar') form does.
That's because that isn't supposed to work -- the ElementTree API works solely with Clarke notation for namespaces, not with namespace prefixes. nsmap is used to generate namespace *declarations* on the Element, so that during serialization the namespace prefixes are generated for that namespace URL.
The attached script hopefully illustrates the problem, and also the way that the serialised XML is incorrect in showing an element which was created with no namespace exactly the same as an element in the default namespace (though it appears the parser picks this up correctly when reparsing the output!?).
Hm, the latter, elements without namespace ending up in the default namespace, does sound like a real problem. I can reproduce it with this line: e = etree.Element('foo', nsmap={None: 'http://default.com'}) where the XML serializes as: <foo xmlns="http://default.com"/> I'm not sure *what* we should expect in this case though, as how would one spell 'foo is in no namespace' in XML? I guess in that case the default namespace could be removed or changed to a namespace prefix that's autogenerated... Anyone have any ideas?
Am I misunderstanding something fundamental?
Yes, but you also pointed out a real problem. Thanks! Regards, Martijn
Martijn Faassen writes:
That's because that isn't supposed to work -- the ElementTree API works solely with Clarke notation for namespaces, not with namespace prefixes.
Hmm. I'm guessing that there's no chance of it being introduced, either to the ElementTree API part of lxml, or as an additional part? Not that it's of great importance, it's just I'd like to be able to use one method consistently, and that's the method used in xpath() - which I use more frequently than the other means of navigating through a tree.
I'm not sure *what* we should expect in this case though, as how would one spell 'foo is in no namespace' in XML? I guess in that case the default namespace could be removed or changed to a namespace prefix that's autogenerated... Anyone have any ideas?
Your suggestion is what I would do, but I'm no expert. Jamie -- Artefact Publishing: http://www.artefact.org.nz/ GnuPG Public Key: http://www.artefact.org.nz/people/jamie.html
On Mon, 2005-08-22 at 18:43 +0200, Martijn Faassen wrote:
Hi there,
Jamie Norrish wrote:
It appears as though creating a namespaced element using etree.SubElement(d, 'm:bar', nsmap={'m': ns}) does not work correctly, while using the etree.SubElement(d, '{ns}bar') form does.
That's because that isn't supposed to work -- the ElementTree API works solely with Clarke notation for namespaces, not with namespace prefixes.
nsmap is used to generate namespace *declarations* on the Element, so that during serialization the namespace prefixes are generated for that namespace URL.
The attached script hopefully illustrates the problem, and also the way that the serialised XML is incorrect in showing an element which was created with no namespace exactly the same as an element in the default namespace (though it appears the parser picks this up correctly when reparsing the output!?).
Hm, the latter, elements without namespace ending up in the default namespace, does sound like a real problem. I can reproduce it with this line:
e = etree.Element('foo', nsmap={None: 'http://default.com'})
where the XML serializes as:
<foo xmlns="http://default.com"/>
I'm not sure *what* we should expect in this case though, as how would one spell 'foo is in no namespace' in XML? I guess in that case the
To disable the default namespace there's xmlns="", but it won't work here, since we are declaring a default namespace on this element. I guess in XML 1.1, we could use xmlns:foo="" to explicitely have a <foo:foo/> in no namespace; but that's ugly anyway.
default namespace could be removed or changed to a namespace prefix that's autogenerated... Anyone have any ideas?
Am I misunderstanding something fundamental?
Yes, but you also pointed out a real problem. Thanks!
I'm always happy to gabble about how DOM works in this case ;-) It is indeed a problem, which, in the case of DOM, is solved with the W3C DOM's namespace normalization algorithm [1] - before serialization. In DOM you really are responsible for what you're doing, so if you add a namespace declaration at the wrong position it might unexpectedly change a node's namespace. That's the DOM users error then. To avoid such errors there's the normalization mechanism, which on the other side has the drawback that it can break QNames in text nodes, since it removes/creates namespace declarations if needed. With Libxml2 such a mechanism is normally not possible, since one cannot simply remove ns-declarations for they are referenced by subsequent nodes. That's exactly the reason I'm not following Libxml2's namespace philosophy in our Delphi DOM wrapper. Martijn, I talked enough about that at Libxml2's list, so you should know what I mean. Mechanisms (possibly naiv) I see here: Consider e = etree.Element('foo', nsmap={None: 'http://default.com'}) 1. Do nothing; the user is responsible for namespace semantics. 2. Raise an error; the user is not allowed to change the namespace (or no namespace) of a node by declaring a namespace binding. This need's to be checked for all nodes in the descendant-or-self axis. 3. Define that the above shown element creation, does indeed create an element in the default namespace 'http://default.com'. Define that no nsmap must be given if creating an element in no namespace. Hmm, forget 2 and 3, since this does not protect us against adding of a child element in no namespace to a parent in a default namespace, which, when serialized, would result in binding the child to the default namespace. How about automatic namespace reconciliation when we add nodes to the tree, or user-driven, prior to serialization? Have a look at xmlDOMWrapReconcileNamespaces(), which we added as a replacement for xmlReconciliateNs(). Maybe this is what we need here. Additionally it would be a nice case to test the function with regard to "disabling the default namespace"; I dunno if this already worked. [1] http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/namespaces-algorithm... Regards, Kasimier
participants (3)
-
Jamie Norrish
-
Kasimier Buchcik
-
Martijn Faassen