[lxml-dev] pretty-printing

Hi!
I've posted a question about pretty-printing some time ago. Now I finally found the time to construct an example.
The problem occurs with the following code:
nsmap = dict (foo="http://foo.org", bar = "http://bar.org") e = Element("{http://foo.org%7Dsomefoo", nsmap = nsmap) s = Element("{http://bar.org%7Dsomebar", nsmap = nsmap) e.append(s1) et = ElementTree(e) et.write("foo.xml", pretty_print = True)
This code creates the following XML file:
<foo:somefoo xmlns:foo="http://foo.org" xmlns:bar="http://bar.org%22%3E <bar:somebar xmlns:foo="http://foo.org" xmlns:bar="http://bar.org%22/%3E </foo:somefoo>
The problem is that "bar:somebar" redundantly declares the namespaces for "foo" and "bar", which affects both the readability and the size of the XML file.
If the element "s" is appended to "e" by using the SubElement function instead of "append", the content of the XML file looks like I'd expect:
<foo:somefoo xmlns:foo="http://foo.org" xmlns:bar="http://bar.org%22%3E bar:somebar/ </foo:somefoo>
Is this a known bug?
Thanks & best regards,
Albert Brandl

Hi,
Albert Brandl wrote:
I've posted a question about pretty-printing some time ago. Now I finally found the time to construct an example.
The problem occurs with the following code:
nsmap = dict (foo="http://foo.org", bar = "http://bar.org") e = Element("{http://foo.org%7Dsomefoo", nsmap = nsmap) s = Element("{http://bar.org%7Dsomebar", nsmap = nsmap) e.append(s1) et = ElementTree(e) et.write("foo.xml", pretty_print = True)
This code creates the following XML file:
<foo:somefoo xmlns:foo="http://foo.org" xmlns:bar="http://bar.org%22%3E <bar:somebar xmlns:foo="http://foo.org" xmlns:bar="http://bar.org%22/%3E </foo:somefoo>
I get the same here.
The problem is that "bar:somebar" redundantly declares the namespaces for "foo" and "bar", which affects both the readability and the size of the XML file.
True. Doesn't affect it's semantics, though.
If the element "s" is appended to "e" by using the SubElement function instead of "append", the content of the XML file looks like I'd expect:
<foo:somefoo xmlns:foo="http://foo.org" xmlns:bar="http://bar.org%22%3E bar:somebar/ </foo:somefoo>
That case is easier to handle than the above, so lxml/libxml2 optimises it.
Is this a known bug?
It's known - though not really a bug but rather an inconvenience. Currently, we use a function in libxml2 called xmlReconciliateNs() to fix the namespaces when merging trees. This function shows the above behaviour. To fix this, we'd have to implement our own version, which is a bit tricky and just wasn't important enough to try to get right so far. Note that even libxml2 had a (minor) bug up to version 2.6.26 here, so it's really not trivial to get this kind of thing right.
Stefan

Hi!
Thanks for the reply.
On Wed, Nov 22, 2006 at 09:40:43PM +0100, Stefan Behnel wrote:
It's known - though not really a bug but rather an inconvenience. Currently, we use a function in libxml2 called xmlReconciliateNs() to fix the namespaces when merging trees. This function shows the above behaviour. To fix this, we'd have to implement our own version, which is a bit tricky and just wasn't important enough to try to get right so far. Note that even libxml2 had a (minor) bug up to version 2.6.26 here, so it's really not trivial to get this kind of thing right.
It's not that important, and I'll find a way to use SubElement instead of append. I just wanted to make sure that you know about this somewhat unexpected behaviour.
Regards,
Albert

Hi again,
Stefan Behnel wrote:
Albert Brandl wrote:
The problem occurs with the following code:
nsmap = dict (foo="http://foo.org", bar = "http://bar.org") e = Element("{http://foo.org%7Dsomefoo", nsmap = nsmap) s = Element("{http://bar.org%7Dsomebar", nsmap = nsmap) e.append(s1) et = ElementTree(e) et.write("foo.xml", pretty_print = True)
This code creates the following XML file:
<foo:somefoo xmlns:foo="http://foo.org" xmlns:bar="http://bar.org%22%3E <bar:somebar xmlns:foo="http://foo.org" xmlns:bar="http://bar.org%22/%3E </foo:somefoo>
Is this a known bug?
It's known - though not really a bug but rather an inconvenience. Currently, we use a function in libxml2 called xmlReconciliateNs() to fix the namespaces when merging trees. This function shows the above behaviour. To fix this, we'd have to implement our own version, which is a bit tricky and just wasn't important enough to try to get right so far. Note that even libxml2 had a (minor) bug up to version 2.6.26 here, so it's really not trivial to get this kind of thing right.
I finally took a(nother) shot at it and I now have an implementation that can avoid this kind of problem. It's currently stored in the "nscleanup" branch, but I will move it to the trunk ASAP. Please give it a try then, to see if it works nicely for you in other cases where you encountered this.
Stefan

Hoi
On 2006-12-04 08:49:22 +0100, Stefan Behnel behnel_ml@gkec.informatik.tu-darmstadt.de said:
Hi again,
Stefan Behnel wrote:
Albert Brandl wrote:
The problem occurs with the following code:
nsmap = dict (foo="http://foo.org", bar = "http://bar.org") e = Element("{http://foo.org%7Dsomefoo", nsmap = nsmap) s = Element("{http://bar.org%7Dsomebar", nsmap = nsmap) e.append(s1) et = ElementTree(e) et.write("foo.xml", pretty_print = True)
This code creates the following XML file:
<foo:somefoo xmlns:foo="http://foo.org" xmlns:bar="http://bar.org%22%3E <bar:somebar xmlns:foo="http://foo.org" xmlns:bar="http://bar.org%22/%3E </foo:somefoo>
Is this a known bug?
It's known - though not really a bug but rather an inconvenience. Currently, we use a function in libxml2 called xmlReconciliateNs() to fix the namespaces when merging trees. This function shows the above behaviour. To fix this, we'd have to implement our own version, which is a bit tricky and just wasn't important enough to try to get right so far. Note that even libxml2 had a (minor) bug up to version 2.6.26 here, so it's really not trivial to get this kind of thing right.
I finally took a(nother) shot at it and I now have an implementation that can avoid this kind of problem. It's currently stored in the "nscleanup" branch, but I will move it to the trunk ASAP. Please give it a try then, to see if it works nicely for you in other cases where you encountered this.
That has not made it to the latest release, has it? Any plans to get it in?

Hi,
Christian Zagrodnick wrote:
On 2006-12-04 08:49:22 +0100, Stefan Behnel behnel_ml@gkec.informatik.tu-darmstadt.de said:
we use a function in libxml2 called xmlReconciliateNs() to fix the namespaces when merging trees. This function shows the above behaviour. To fix this, we'd have to implement our own version, which is a bit tricky and just wasn't important enough to try to get right so far. Note that even libxml2 had a (minor) bug up to version 2.6.26 here, so it's really not trivial to get this kind of thing right.
I finally took a(nother) shot at it and I now have an implementation that can avoid this kind of problem. It's currently stored in the "nscleanup" branch, but I will move it to the trunk ASAP. Please give it a try then, to see if it works nicely for you in other cases where you encountered this.
That has not made it to the latest release, has it? Any plans to get it in?
It's still on the list. It didn't make it into 1.2, as I couldn't find the time to make it work correctly. It still doesn't pass all of our test cases.
I know for myself how important this change is and I'll try to get it in soon. The merge will just have to wait until it really works. This is a very critical function that can break a horrible lot of things in an unexpected way. Once it works, there will definitely be a beta version before it gets its final blessing.
Stefan
participants (4)
-
Albert Brandl
-
Christian Zagrodnick
-
Stefan Behnel
-
Stefan Behnel