May 13, 2018
7:10 p.m.
Chris Jerdonek writes:
You don't need a regex. You can just do a string slice html[i:j] after measuring the length of the opening and closing tags you expect. I would also do an assertion that the first and last characters that you're removing are what you expect. Depending on the specifics, it might be necessary for you to clear the attributes of the root element and/or set the tail to None.
How obvious that should have been! This makes things a bit neater than my previous version. inner = b'abc<em>x<u>y</u>z</em>123<h>f</h>uvw' outer = b'<foo>%s</foo>' % inner html = lxml.html.fromstring(outer) result = html.text.encode('utf-8') + b''.join(map(lxml.html.tostring, html)) assert result == inner