[lxml] Re: Performance issues when using element.clear() in Python 3.x

Feb. 13, 2025


      On 13 Feb 2025, at 15:18, Stefan Behnel via lxml - The Python XML Toolkit wrote:
...
Are you using the same versions of lxml (and libxml2) in both?
There shouldn't be a difference in behaviour, except for the obvious language differences (bytes/unicode).
Based on the parsing code we use in Openpyxl, I'd agree with this. NB., we discovered that, for pure parsing, ie. you just want to get at the data, the standard library's etree module is often significantly faster, but YMMV.
...
Does the memory consumption stay constant over time or does it continuously grow as it parses?
Have you run a memory profiler on your code? Or a (statistical) line profiler to see where the time is spent
Excellent suggestions: memory_profiler and pympler are useful tools for this.

Charlie

--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226