
Hi, I tried the namespace feature of lxml and there is one thing that bothers me: Creating a new element via Element("{myns}elem1") adds a ns-declaration to it, which is correct and just fine. However when I append this new element to an existing one, which also has a namespace declaration (say elem2), the ns-declaration of my first element persists. I think this is unecessary and confusing if you read the string value of an xml document. If I create new Elements without a parent element to put them into the tree later at various positions I end up with a ton of ns-declarations all declaring the same namespace. So is there a way I'm overlooking to tell lxml: When I append elem1 to elem2, please go through all parent elemens to check for a namespace declaration matching the on of elem1. If you find one, drop the ns declaration from elem1 itself. Now I don't know wether the problem here is actually lxml or libxml2 itself, but I figured I start here and work my way through. Andreas -- You will be attacked by a beast who has the body of a wolf, the tail of a lion, and the face of Donald Duck.

Andreas Pakulat wrote:
That's due to libxml2. Internally, we are copying elements between documents, and the libxml2 representations of both keep namespace declarations.
It is confusing and even annoying - but not unnecessary. It actually saves a lot of work. In the case you describe, we'd have to go all through the XML hierarchy on each insertion and check all NS declarations of each ancestor for all namespaces of the newly inserted tree of elements until we find the ones that match. Then we'd have to go back towards the element and check if the prefixes we found have not been redeclared. If not, we'd have to go all through the inserted element hierarchy and replace all occurrences of the default namespace, the prefixes we found and the namespace it is now associated with by the new prefix (or remove these declarations - whichever might be simpler). So, I would not call this laziness unnecessary.
This would mean adding a keyword parameter (or whatever) to make this behaviour explicit. I personally don't think we'd want that behaviour, neither explicit nor implicit. The problem is very much related to output only, so it should not slow down the treatment of elements.
Now I don't know wether the problem here is actually lxml or libxml2 itself, but I figured I start here and work my way through.
It's more of a problem in libxml2. The library is rather inconsistent in the way it cleans up namespaces on serialization. You can see that in the included xmllint program. It has an option called '--nsclean' that is supposed to remove redundant namespace declarations. But I have not yet found this to work in any of the cases where I really needed NS cleanup. It seems to fail especially in the case where the default namespace is re-declared. Another problem is related to libxslt: xsl:copy copies redundant namespace declarations and even xmllint then has problems in cleaning them up afterwards. So this is clearly a problem in libxml2. There isn't much lxml can do about it. The best solution IMHO would be to improve the XML serialization of libxml2. Stefan

On 28.01.06 13:12:19, Stefan Behnel wrote:
Actually the internal state doesn't interest me at all, see below.
Yep, thats what I was thinking shortly after sending out the mail :-( You're absolutely right, reordering namespaces on each and every insert/remove of an element is surely too expensive. And this is only the "internal state" and thus I shouldn't have to bother with it anyway...
Hmm, so I wrote to the wrong ML it seems :-(
So this is clearly a problem in libxml2. There isn't much lxml can do about it. The best solution IMHO would be to improve the XML serialization of libxml2.
Ok, thanks for clarifying that (BTW: Why didn't you tell me on python-de ;-)) I'll go and try my luck with the libxml2 guys. Andreas -- You will be a winner today. Pick a fight with a four-year-old.

Andreas Pakulat wrote:
Sorry, I actually hadn't though about it enough in my first PyDE reply. In any case, I think it's better to have this Q&A on the lxml list, so that others can find it. Thanks for filing a bug report. Would be nice if you could report back if anything interesting happens over there. Stefan

On 31.01.06 13:58:44, Stefan Behnel wrote:
It was immmediately closed as "not a bug"... Anyway, the reported bug was that -nsclean doesn't work as expected (you need to have equal NS prefixes if -nsclean should work...), thus I filed a wishlist bug for a new serializer option to remove redundant namespaces when transforming into text form. You can have a look at it here: http://bugzilla.gnome.org/show_activity.cgi?id=329347 Andreas -- Truth will out this morning. (Which may really mess things up.)

On 28.01.06 13:12:19, Stefan Behnel wrote:
Correct.
I just tried without a re-declared default namespace and it still is not working. I'm going to check wether a bug is open and eventually report one. This shouldn't be that hard to do, at least not on serialization... Andreas -- Never commit yourself! Let someone else commit you.

Andreas Pakulat wrote:
That's due to libxml2. Internally, we are copying elements between documents, and the libxml2 representations of both keep namespace declarations.
It is confusing and even annoying - but not unnecessary. It actually saves a lot of work. In the case you describe, we'd have to go all through the XML hierarchy on each insertion and check all NS declarations of each ancestor for all namespaces of the newly inserted tree of elements until we find the ones that match. Then we'd have to go back towards the element and check if the prefixes we found have not been redeclared. If not, we'd have to go all through the inserted element hierarchy and replace all occurrences of the default namespace, the prefixes we found and the namespace it is now associated with by the new prefix (or remove these declarations - whichever might be simpler). So, I would not call this laziness unnecessary.
This would mean adding a keyword parameter (or whatever) to make this behaviour explicit. I personally don't think we'd want that behaviour, neither explicit nor implicit. The problem is very much related to output only, so it should not slow down the treatment of elements.
Now I don't know wether the problem here is actually lxml or libxml2 itself, but I figured I start here and work my way through.
It's more of a problem in libxml2. The library is rather inconsistent in the way it cleans up namespaces on serialization. You can see that in the included xmllint program. It has an option called '--nsclean' that is supposed to remove redundant namespace declarations. But I have not yet found this to work in any of the cases where I really needed NS cleanup. It seems to fail especially in the case where the default namespace is re-declared. Another problem is related to libxslt: xsl:copy copies redundant namespace declarations and even xmllint then has problems in cleaning them up afterwards. So this is clearly a problem in libxml2. There isn't much lxml can do about it. The best solution IMHO would be to improve the XML serialization of libxml2. Stefan

On 28.01.06 13:12:19, Stefan Behnel wrote:
Actually the internal state doesn't interest me at all, see below.
Yep, thats what I was thinking shortly after sending out the mail :-( You're absolutely right, reordering namespaces on each and every insert/remove of an element is surely too expensive. And this is only the "internal state" and thus I shouldn't have to bother with it anyway...
Hmm, so I wrote to the wrong ML it seems :-(
So this is clearly a problem in libxml2. There isn't much lxml can do about it. The best solution IMHO would be to improve the XML serialization of libxml2.
Ok, thanks for clarifying that (BTW: Why didn't you tell me on python-de ;-)) I'll go and try my luck with the libxml2 guys. Andreas -- You will be a winner today. Pick a fight with a four-year-old.

Andreas Pakulat wrote:
Sorry, I actually hadn't though about it enough in my first PyDE reply. In any case, I think it's better to have this Q&A on the lxml list, so that others can find it. Thanks for filing a bug report. Would be nice if you could report back if anything interesting happens over there. Stefan

On 31.01.06 13:58:44, Stefan Behnel wrote:
It was immmediately closed as "not a bug"... Anyway, the reported bug was that -nsclean doesn't work as expected (you need to have equal NS prefixes if -nsclean should work...), thus I filed a wishlist bug for a new serializer option to remove redundant namespaces when transforming into text form. You can have a look at it here: http://bugzilla.gnome.org/show_activity.cgi?id=329347 Andreas -- Truth will out this morning. (Which may really mess things up.)

On 28.01.06 13:12:19, Stefan Behnel wrote:
Correct.
I just tried without a re-declared default namespace and it still is not working. I'm going to check wether a bug is open and eventually report one. This shouldn't be that hard to do, at least not on serialization... Andreas -- Never commit yourself! Let someone else commit you.
participants (2)
-
Andreas Pakulat
-
Stefan Behnel