remove an element without removing its tail?
I found a solution that works for this example: float.tail = q.tail Is that what others would do? I still don't know what kind of animal "tail" is. It's a string. I sort of understand that an element is a list. So an XML document is a list with nested lists. That's easy to get. It's harder to see how the string of tail is globbed on to a particular list. Some element.text is the value of a list item. But how does the string "tail" know where it belongs? Of course, as long as it works, I don't really need to know... I am struggling with a tail problem. Here is the example; <sp> <speaker>Rainoldes.</speaker> <p>You may learne the reason hereof in your <hi>Por‑tesse,</hi> reformed lately by the Pope. In your olde <note n="c" place="margin"> <hi>Portiforium seu breuiarium, ad vsum ecclesiae Sarum: in festo S. Thomae Can‑••ariensis.</hi> </note> <hi>Portesse</hi> there was this prayer to the Popes martyr, <hi>S. Thomas Bec‑ket of Canterbury:</hi> <q> <floatingText> <body> <div type="version"> <l>Christe Iesu,</l> <l>per Thomae vulnera,</l> <l>Quae nos ligant.</l> <l>relaxa scelera.</l> </div> <div type="version"> <l>By Thomas woundes,</l> <l>O Christ Iesus,</l> <l>Loose thou the sinnes▪</l> <l>which do binde vs.</l> </div> </body> </floatingText> </q> Or, if you will haue better ryme, with as bad reason: <q> <pb n="480" facs="tcp:15991:239"/> <floatingText> <body> <div type="version"> <l>Tu per Thomae sanguinem</l> <l>quem pro te impendit,</l> <l>Fac nos Christe scandere</l> <l>quo Thomas a•cendit.</l> </div> <div type="version"> <l>By the blood of Thomas</l> <l>which he for thee did spend,</l> <l>Make vs O Christ to clime</l> <l>whether he did a••end.</l> </div> </body> </floatingText> </q> <q> <l>Mary had a little lamb</l> </q> </p> </sp> In this example (and many others from a larger collection) I want to strip <q> as unnecessary wrappers. In this particular case, I could use strip_tags. But this wouldn't work in cases where q does not wrap a floatingText element. In many cases, it would be simple to do something like q.addprevious(floatingText) parent = q.getparent() parent.remove(q) But this code removes the tail of q, which I want to keep. How do I remove the empty q element without removing the tail as well? MM
I found a solution that works for this example:
float.tail = q.tail
I think that's just what you need, provided that the text data in question should directly follow the float element.
Is that what others would do? I still don't know what kind of animal "tail" is. It's a string. I sort of understand that an element is a list. So an XML document is a list with nested lists. That's easy to get. It's harder to see how the string of tail is globbed on to a particular list. Some element.text is the value of a list item. But how does the string "tail" know where it belongs?
The documentation has some hints on this here: http://lxml.de/tutorial.html#elements-contain-text Basically, the "classic" approach as found in DOM or XPath is to represent each text in an XML document as a special text node. E.g. for <html> <body>Hello<br/>World</body> </html> the thinking in XPath is that "Hello" and "World" are both text node children of <body>. With XPath you could get at text nodes like this:
root.xpath('//text()') ['Hello', 'World']
And the parent node of these text nodes is:
root.xpath('//text()/ancestor::node()[1]') [<Element body at 0x7f606f1fcb00>]
Whereas ElementTree and therefore lxml model "Hello" as the .text attribute of <body> and "World" as the .tail attribute of <br/>, for the sake of not needing special text node types at all. So the lxml smart string results of the above XPath expression give you the "parent" element for the .text and .tail values like:
[ (result, type(result), result.getparent()) for result in root.xpath ('//text()') ] [('Hello', <class 'lxml.etree._ElementStringResult'>, <Element body at 0x7f606f1fcb00>), ('World', <class 'lxml.etree._ElementStringResult'>, <Element br at 0x7f606f1fcf38>)]
In many situations this is advantageous in that you don't have to special-case the handling of text nodes all the time, e.g. (opposed to element nodes) they don't have a tag, can't carry attributes, can't contain children, and whatnot. But I'm sure there are cases where the .text/.tail machinery can get quirky. Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
participants (2)
-
Holger Joukl
-
Martin Mueller