I think I’ve found a memory leak in lxml and I’m wondering if anyone can confirm if this is a problem?

 

I’ve created a simple script to demonstrate below. Calling the xpath() method on a document seems to leave something in memory if (and only if) the method is called within a thread. Even when the thread completes the memory doesn’t appear to be freed (it’s not much, but it accumulates over time). You can see this using “ps” and looking at the memory usage for the python script.

 

I’m using Ubuntu 12.04 and this happens with the default Python 2.7.3, Lxml 2.3.2, LibXml 2.7.8 and LibXslt 1.1.26

I tried upgrading to Lxml 3.2.3 but the same thing seems to happen. I’m about to attempt upgrading the LibXml package too but I’d appreciate any suggestions/help – even if someone could confirm if the same problem happens on their system would be good.

 

Thanks!

 

 

import lxml.etree

import thread

import time

 

def test():

    doc = lxml.etree.fromstring("<Root></Root>")

    doc.xpath("/Root") # This line seems to cause memory usage to go up only when used in a thread

    # doc.getchildren() # If this line is used instead the memory usage stays constant

 

for i in range(100000):

    thread.start_new_thread(test,())

    #test() # Using this line instead of starting a thread causes the memory usage to stay constant

    time.sleep(1) # Give plenty of time for the thread to complete

 

raw_input("finished")