
On 13 Feb 2025, at 15:18, Stefan Behnel via lxml - The Python XML Toolkit wrote:
Are you using the same versions of lxml (and libxml2) in both?
There shouldn't be a difference in behaviour, except for the obvious language differences (bytes/unicode).
Based on the parsing code we use in Openpyxl, I'd agree with this. NB., we discovered that, for pure parsing, ie. you just want to get at the data, the standard library's etree module is often significantly faster, but YMMV.
Does the memory consumption stay constant over time or does it continuously grow as it parses?
Have you run a memory profiler on your code? Or a (statistical) line profiler to see where the time is spent
Excellent suggestions: memory_profiler and pympler are useful tools for this. Charlie -- Charlie Clark Managing Director Clark Consulting & Research German Office Sengelsweg 34 Düsseldorf D- 40489 Tel: +49-203-3925-0390 Mobile: +49-178-782-6226