
On Sun, May 13, 2018 at 7:08 AM, Thomas Levine <_@thomaslevine.com> wrote:
Peng Yu writes:
Hi, I'd like to get the text that would be got from tostring() except removing the outmost tag.
Something like
<foo>abc<em>x<u>y</u>z</em>123<h>f</h>uvw</foo>
should be returned as
abc<em>x<u>y</u>z</em>123<h>f</h>uvw
inner = 'abc<em>x<u>y</u>z</em>123<h>f</h>uvw' outer = '<foo>%s</foo>' % inner html = lxml.html.fromstring(outer.encode('utf-8')) result = (html.text.encode('utf-8') + b''.join(lxml.html.tostring(child) for child in html.getchildren()) ).decode('utf-8') assert result == inner
While removal of the outer tag seems fundamentally incorrect to me, I have at least once had a good reason to do this before.
I just realize that tostring() makes changes to symbols like °. If I just to strip the outermost tag, without changing anything to the internal text. How to do it? Thanks. from lxml import etree tree = etree.XML('<foo>25/15°C <bar>abc</bar></foo>') print etree.tostring(tree) The output of the above code is the following. <foo>25/15°C <bar>abc</bar></foo> -- Regards, Peng