On Tue, 22 Apr 2008 22:42:12 +0200 Stefan Behnel <stefan_ml@behnel.de> wrote:
Hi,
Andreas Degert wrote:
I assume it is legal to have to following namespace declaration/usage:
<top xmlns="a" xmlns:a="a" xmlns:b="b"> <foo bar=""/> <b:foobar a:bar=""/> </top>
Sure, the spec calls this well-formed XML - not talking aesthetics, though.
It works when I read such a definition with lxml.etree.parse, but I can't construct it with lxml.etree.Element because then the nsmap dict will be normalized in such a way that each URI occurs only once.
Finally someone complaining that there are too *few* namespace declarations instead of too many. ;o)
lxml does a lot of work behind the scenes to keep namespaces consistent and simple throughout whatever operation you affect at the API level. In the case you describe, lxml checks on each new namespace prefix declaration if that namespace is already defined in the tree context of the Element and reuses the old prefix if that is the case. The function that does that is _initNodeNamespaces() in apihelpers.pxi, in case you're interested.
Is this a bug in lxml or shouldn't it be used in this way?
I don't see the use case. What could you do with redundant namespace prefix declarations that you can't do with a single one?
I think the behaviour leads to a bug: t = Element("top",nsmap={None:"a","b":"b"}) SubElement(t, "{b}foobar", {"{a}bar":""}) print tostring(t, pretty_print=True) ----- <top xmlns="a" xmlns:b="b"> <b:foobar bar=""/> </top> ----- In the output the attribute bar should have namespace a, but it has no namespace (the default namespace doesn't apply to attributes as specified in http://www.w3.org/TR/REC-xml-names/#scoping-defaulting, section 6.2). hmmm... even simpler example: Element("top", {"bar":"", "{a}bar":""}, nsmap={None:"a","b":"b"}) yields <top xmlns="a" xmlns:b="b" bar="" bar=""/>
Imagine you have two prefixes defined for a namespace and you add a subelement with that namespace. Which prefix should be used? What purpose does that ambiguity serve?
The default namespace is a special case because it doesn't apply to attributes (this means when attributes have a namespace value they must be serialized with a prefix). When serializing elements the default namespace should have a higher priority, i.e. those elements can be written without prefix.
Stefan