I’ve tried the same thing using Python 3.2 since that uses newer versions of Lxml and LibXml and the same problem exists. In fact, I think I’ve traced the problem to lxml.etree.XPath when it’s compiling an xpath expression. Even disabling
smart_strings has no effect.
I’ve also read http://lxml.de/FAQ.html#id1 but I don’t think any of that applies as I’m not running the threads concurrently, I’m just using a thread (in this
example anyway) and it’s keeping something in memory.
I’ve also been looking at the error log that lxml seems to store, but I don’t think that applies since there should be no errors in this case (and it only seems to store ~100 messages but in my example the memory just keeps on growing)
Could anyone confirm if this is really a bug?
Thanks,
from lxml import etree
import threading
import time
def test():
etree.XPath("/Root",smart_strings=False)
for i in range(100000):
threading.Thread(target=test).start()
#test() # Using this line instead of starting a thread causes the memory usage to stay constant
time.sleep(1) # Give plenty of time for the thread to complete
raw_input("finished")
From: lxml [mailto:lxml-bounces@lxml.de]
On Behalf Of Brian Bird
Sent: 05 November 2013 16:09
To: lxml@lxml.de
Subject: [lxml] Possible memory leak in xpath()
I think I’ve found a memory leak in lxml and I’m wondering if anyone can confirm if this is a problem?
I’ve created a simple script to demonstrate below. Calling the xpath() method on a document seems to leave something in memory if (and only if) the method is called within a thread. Even when the thread completes the memory doesn’t appear
to be freed (it’s not much, but it accumulates over time). You can see this using “ps” and looking at the memory usage for the python script.
I’m using Ubuntu 12.04 and this happens with the default Python 2.7.3, Lxml 2.3.2, LibXml 2.7.8 and LibXslt 1.1.26
I tried upgrading to Lxml 3.2.3 but the same thing seems to happen. I’m about to attempt upgrading the LibXml package too but I’d appreciate any suggestions/help – even if someone could confirm if the same problem happens on their system
would be good.
Thanks!
import lxml.etree
import thread
import time
def test():
doc = lxml.etree.fromstring("<Root></Root>")
doc.xpath("/Root") # This line seems to cause memory usage to go up only when used in a thread
# doc.getchildren() # If this line is used instead the memory usage stays constant
for i in range(100000):
thread.start_new_thread(test,())
#test() # Using this line instead of starting a thread causes the memory usage to stay constant
time.sleep(1) # Give plenty of time for the thread to complete
raw_input("finished")