Difference between Element addnext() and insert() functions
Hello, I've been scratching my head over the following question, and can't quite figure this out. Any comments are appreciated :) http://stackoverflow.com/questions/23282241/lxml-difference-between-element-... Thanks! Jens -- Jens Tröger http://savage.light-speed.de/
Hi,
Von: Jens Tröger <jens.troeger@light-speed.de> I've been scratching my head over the following question, and can't quite figure this out. Any comments are appreciated :)
http://stackoverflow.com/questions/23282241/lxml-difference- between-element-addnext-and-insert-functions
? Works just as documented:
root = etree.fromstring('<root><a/><b/><c/></root>') print etree.tostring(root, pretty_print=True) <root> <a/> <b/> <c/> </root>
child = root[0] help(root[0].addnext) Help on built-in function addnext:
addnext(...) addnext(self, element) Adds the element as a following sibling directly after this element. This is normally used to set a processing instruction or comment after the root node of a document. Note that tail text is automatically discarded when adding at the root level.
child.addnext(etree.Element('a_addnext')) print etree.tostring(root, pretty_print=True) <root> <a/> <a_addnext/> <b/> <c/> </root>
help(root[0].addprevious) Help on built-in function addprevious:
addprevious(...) addprevious(self, element) Adds the element as a preceding sibling directly before this element. This is normally used to set a processing instruction or comment before the root node of a document. Note that tail text is automatically discarded when adding at the root level.
child.addprevious(etree.Element('a_addprevious')) print etree.tostring(root, pretty_print=True) <root> <a_addprevious/> <a/> <a_addnext/> <b/> <c/> </root>
root.insert(root.index(child), etree.Element('root_insert')) print etree.tostring(root, pretty_print=True) <root> <a_addprevious/> <root_insert/> <a/> <a_addnext/> <b/> <c/> </root>
Maybe you just confused that addnext() and addprevious() add siblings and insert() inserts children?
root = etree.fromstring('<root><a/><b/><c/></root>') child = root[0] child.addprevious(etree.Element('before_a')) # equivalent: root.insert (root.index(child), etree.Element('before_a'))
child.addnext(etree.Element('after_a')) # equivalent: root.insert (root.index(child) + 1, etree.Element('after_a'))
Regards Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
My bad, it's more about the text/tail handling of these two functions. I've clarified the original question http://stackoverflow.com/questions/23282241/lxml-difference-between-element-... Although I'm aware of the comment on addnext() I still don't quite see the comment align with what's happening, ie the tail is not actually discarded from <b> but appended to tail of <i>. Why is that? And what's the intention of discarding tail anyway? Cheers, Jens On Fri, Apr 25, 2014 at 03:10:38PM +0200, Holger Joukl wrote:
Hi,
Von: Jens Tröger <jens.troeger@light-speed.de> I've been scratching my head over the following question, and can't quite figure this out. Any comments are appreciated :)
http://stackoverflow.com/questions/23282241/lxml-difference-between-element-...
?
Works just as documented:
root = etree.fromstring('<root><a/><b/><c/></root>') print etree.tostring(root, pretty_print=True) <root> <a/> <b/> <c/> </root>
child = root[0] help(root[0].addnext) Help on built-in function addnext:
addnext(...) addnext(self, element)
Adds the element as a following sibling directly after this element.
This is normally used to set a processing instruction or comment after the root node of a document. Note that tail text is automatically discarded when adding at the root level.
child.addnext(etree.Element('a_addnext')) print etree.tostring(root, pretty_print=True) <root> <a/> <a_addnext/> <b/> <c/> </root>
help(root[0].addprevious) Help on built-in function addprevious:
addprevious(...) addprevious(self, element)
Adds the element as a preceding sibling directly before this element.
This is normally used to set a processing instruction or comment before the root node of a document. Note that tail text is automatically discarded when adding at the root level.
child.addprevious(etree.Element('a_addprevious')) print etree.tostring(root, pretty_print=True) <root> <a_addprevious/> <a/> <a_addnext/> <b/> <c/> </root>
root.insert(root.index(child), etree.Element('root_insert')) print etree.tostring(root, pretty_print=True) <root> <a_addprevious/> <root_insert/> <a/> <a_addnext/> <b/> <c/> </root>
Maybe you just confused that addnext() and addprevious() add siblings and insert() inserts children?
root = etree.fromstring('<root><a/><b/><c/></root>') child = root[0] child.addprevious(etree.Element('before_a')) # equivalent: root.insert (root.index(child), etree.Element('before_a'))
child.addnext(etree.Element('after_a')) # equivalent: root.insert (root.index(child) + 1, etree.Element('after_a'))
Regards Holger
Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
-- Jens Tröger http://savage.light-speed.de/
Hi, please don't top-post. Jens Tröger, 27.04.2014 00:06:
My bad, it's more about the text/tail handling of these two functions. Although I'm aware of the comment on addnext() I still don't quite see the comment align with what's happening, ie the tail is not actually discarded from <b> but appended to tail of <i>. Why is that?
It's actually an oversight in the API that I hadn't even noticed before reading about it now. Originally, the methods were intended for adding a way to work with siblings at the root level, e.g. to add processing instructions before the root element. Their general usefulness is more of a side effect and quite clearly wasn't sufficiently tested. So, yes, the behaviour is inconsistent with the rest of the API and should be aligned. That's totally backwards incompatible, though, and thus rather something to fix in lxml 4.0.
And what's the intention of discarding tail anyway?
As documented, it's only discarded at the root level where any tail text isn't allowed anyway (except for whitespace). Stefan
On Sun, Apr 27, 2014 at 09:10:11AM +0200, Stefan Behnel wrote:
Hi,
please don't top-post.
Okies :)
Jens Tröger, 27.04.2014 00:06:
My bad, it's more about the text/tail handling of these two functions. Although I'm aware of the comment on addnext() I still don't quite see the comment align with what's happening, ie the tail is not actually discarded from <b> but appended to tail of <i>. Why is that?
It's actually an oversight in the API that I hadn't even noticed before reading about it now. Originally, the methods were intended for adding a way to work with siblings at the root level, e.g. to add processing instructions before the root element. Their general usefulness is more of a side effect and quite clearly wasn't sufficiently tested.
So, yes, the behaviour is inconsistent with the rest of the API and should be aligned. That's totally backwards incompatible, though, and thus rather something to fix in lxml 4.0.
Thank you for the explanation, that's what Ivan's answer aligns with at the stackoverflow thread as well. What now, should I file a work item for this?
And what's the intention of discarding tail anyway?
As documented, it's only discarded at the root level where any tail text isn't allowed anyway (except for whitespace).
But the tail is not really discarded. Instead, the tail of the current node is appended to the tail of the inserted node; it merely moves from one node to the next. Jens -- Jens Tröger http://savage.light-speed.de/
Hi,
And what's the intention of discarding tail anyway?
As documented, it's only discarded at the root level where any tail text isn't allowed anyway (except for whitespace).
But the tail is not really discarded. Instead, the tail of the current node is appended to the tail of the inserted node; it merely moves from one node to the next.
I probably should mention that the "when adding to the root level" doesn't seem to work either. In the example, <p> would be the root level but tail is merely moved, and the exact same thing happens deeper down: s = "<foo><p>This is <b>bold</b> and this is italic text.</p></foo>" leads to
lxml.etree.tostring(xml) b'<foo><p>This is <b>bold</b><i>italic</i> text. and this is </p></foo>'
Cheers, Jens -- Jens Tröger http://savage.light-speed.de/
Hi,
And what's the intention of discarding tail anyway?
As documented, it's only discarded at the root level where any tail text isn't allowed anyway (except for whitespace).
But the tail is not really discarded. Instead, the tail of the current node is appended to the tail of the inserted node; it merely moves from one node to the next.
I probably should mention that the "when adding to the root level" doesn't seem to work either.
If you see the source, you'll learn that "at the root level" means (in Cython): if self._c_node.parent != NULL and not _isElement(self._c_node.parent): where #define _isElement(c_node) \ (((c_node)->type == XML_ELEMENT_NODE) || \ ((c_node)->type == XML_COMMENT_NODE) || \ ((c_node)->type == XML_ENTITY_REF_NODE) || \ ((c_node)->type == XML_PI_NODE)
In the example, <p> would be the root level but tail is merely moved, and the exact same thing happens deeper down:
s = "<foo><p>This is <b>bold</b> and this is italic text.</p></foo>"
leads to
lxml.etree.tostring(xml) b'<foo><p>This is <b>bold</b><i>italic</i> text. and this is </p></foo>'
Cheers, Jens
-- Best regards, Ivan mailto:vano@mail.mipt.ru
participants (4)
-
Holger Joukl
-
Ivan Pozdeev
-
Jens Tröger
-
Stefan Behnel