
May 13, 2018
11:08 a.m.
Peng Yu writes:
Hi, I'd like to get the text that would be got from tostring() except removing the outmost tag.
Something like
<foo>abc<em>x<u>y</u>z</em>123<h>f</h>uvw</foo>
should be returned as
abc<em>x<u>y</u>z</em>123<h>f</h>uvw
inner = 'abc<em>x<u>y</u>z</em>123<h>f</h>uvw' outer = '<foo>%s</foo>' % inner html = lxml.html.fromstring(outer.encode('utf-8')) result = (html.text.encode('utf-8') + b''.join(lxml.html.tostring(child) for child in html.getchildren()) ).decode('utf-8') assert result == inner While removal of the outer tag seems fundamentally incorrect to me, I have at least once had a good reason to do this before.