Hi Brian & Charlie,

I'm not the OP; but, FYI,  i can see the same issue (on an Intel Mac):

aid@orac tmp % ./tail.py
Python              : sys.version_info(major=3, minor=9, micro=13, releaselevel='final', serial=0)
lxml.etree          : (4, 9, 0, 0)
libxml used         : (2, 9, 14)
libxml compiled     : (2, 9, 14)
libxslt used        : (1, 1, 35)
libxslt compiled    : (1, 1, 35)
b'<form action="action1">\n</form>\n</body>\n</html>\n'

You can see my machine is using lxml 2.9.14; which is a pity as in the thread you linked to it looked like the issue would have been resolved in that version...

However, I found that if you update the call to etree.tostring() to use method='html' then the trailing body and html elements are no longer shown.

i.e.:

print(etree.tostring(nodeList[0], method='html'))

With that update made, the script outputs the desired:

aid@orac tmp % python3 -i tail.py
Python              : sys.version_info(major=3, minor=9, micro=13, releaselevel='final', serial=0)
lxml.etree          : (4, 9, 0, 0)
libxml used         : (2, 9, 14)
libxml compiled     : (2, 9, 14)
libxslt used        : (1, 1, 35)
libxslt compiled    : (1, 1, 35)
b'<form action="action1">\n</form>\n'

I've no idea why this behaviour seems to have changed....

Kind regards

aid

On 7 Jun 2022, at 17:02, Charlie Clark <charlie.clark@clark-consulting.eu> wrote:

On 7 Jun 2022, at 16:56, brian.bird@trustpayments.com wrote:

In more recent versions of lxml the tostring() method can return extra text after the closing tag of the node I've passed to it. So instead of returning

b'\n\n'

it returns

b'\n\n\n\n'

This looks a lot like this https://mail.python.org/archives/list/lxml@python.org/thread/LCTOSIIWGGALAMSZAYHRRYUWYDRESCUO/

Can you update your version of libxml2?

Charlie

--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226

_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-leave@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: aid@logic.org.uk