junos-conf-root.xml
<https://drive.google.com/file/d/1mFGxoExLIE7DopNx3uHGdHvQsqHBPFAn/view?usp=…>
Hi All,
I'm chasing an elusive memory leak and it might be related to lxml.
I hope you can help me to understand it better.
When I parse a large XML file, and let it get garbage collected, memory is
not freed up:
E.g. when I run following code:
import logging
import psutil
import os
import humanize
import gc
LOGGER = logging.getLogger(__name__)
def get_memory_usage(process: psutil.Process) -> int:
with process.oneshot():
return process.memory_full_info().data
def log_mem_diff(process: psutil.Process, message: str) -> int:
usage = get_memory_usage(process)
LOGGER.error(f"{message}: {humanize.naturalsize(usage)}")
return usage
process = psutil.Process(os.getpid())
import xml.etree as etree
import xml.etree.ElementTree
def build_tree(xml):
tree = etree.ElementTree.fromstring(xml)
log_mem_diff(process, "In_scope")
# tree goes out of scope here
# import lxml.etree as etree
# def build_tree(xml):
# parser = etree.XMLParser(remove_blank_text=True, collect_ids=False)
# tree = etree.XML(xml, parser)
# log_mem_diff(process, "In_scope")
with open("junos-conf-root.xml", "r") as f:
xml = f.read()
for i in range(0, 5):
build_tree(xml)
log_mem_diff(process, "before gc")
gc.collect()
log_mem_diff(process, "after gc")
I get
In_scope: 1.4 GB
before gc: 1.4 GB
after gc: 1.4 GB
In_scope: 1.7 GB
before gc: 1.7 GB
after gc: 1.7 GB
In_scope: 1.7 GB
before gc: 1.7 GB
after gc: 1.7 GB
In_scope: 1.7 GB
before gc: 1.7 GB
after gc: 1.7 GB
In_scope: 1.7 GB
before gc: 1.7 GB
after gc: 1.7 GB
This is not a leak per-se, but it behaves unexpectedly in that
1. memory usage goes up
2. running the GC doesn't reduce it
2. running the code again, it doesn't keep going up.
I'm trying to understand this behavior.
Could you be of assistance in this?
Python : sys.version_info(major=3, minor=8, micro=9,
releaselevel='final', serial=0)
lxml.etree : (4, 6, 3, 0)
libxml used : (2, 9, 10)
libxml compiled : (2, 9, 10)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)
Wouter
--
Wouter De Borger
Chief Architect
Inmanta
+32479474994 <0479474994>
wouter.deborger(a)inmanta.com
www.inmanta.com
Kapeldreef 60, 3001 Heverlee
[image: twitter] <https://twitter.com/wdeborger>
[image: linkedin] <https://www.linkedin.com/in/wouter-de-borger-a720507/>