On Sun, May 13, 2018 at 7:31 AM, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
On Sat, May 12, 2018 at 9:58 PM, Peng Yu <pengyu.ut@gmail.com> wrote:
Hi, I'd like to get the text that would be got from tostring() except removing the outmost tag.
Something like
<foo>abc<em>x<u>y</u>z</em>123<h>f</h>uvw</foo>
should be returned as
abc<em>x<u>y</u>z</em>123<h>f</h>uvw
I could do so by calling tostring() then use regex to remove the outermost tag.
You don't need a regex. You can just do a string slice html[i:j] after measuring the length of the opening and closing tags you expect. I would also do an assertion that the first and last characters that you're removing are what you expect. Depending on the specifics, it might be necessary for you to clear the attributes of the root element and/or set the tail to None.
Could you show me some working code? See also my reply to Thomas Levine. Thanks. -- Regards, Peng