[lxml-dev] Huge memory leak in latest 2.0
![](https://secure.gravatar.com/avatar/97ea1e16c6d8cf73612d83dc12ca1011.jpg?s=120&d=mm&r=g)
Hi. I'm using latest 2.0 version from trunk, rev. 49494 (because it supports 'encoding' keyword in HTMLParser). I'm parsing many HTML documents in loop, 100-200kB each. I have noticed that memory used by my program increases about 1MB after each document processed, so after a few hundreds of passes system is about to hang. Running the same code with lxml 1.3.6 doesn't cause such memory usage increase. I'm using the following library calls: tree = etree.parse( <opened file>, HTMLParser(encoding=...)) etree.tostring(tree) el.xpath(...) getting children and attributes of elements I'm using libxml2 version 2.6.28. If anyone knows about solution/workaround, please write. Regards, Artur
![](https://secure.gravatar.com/avatar/8b97b5aad24c30e4a1357b38cc39aeaa.jpg?s=120&d=mm&r=g)
Hi, Artur Siekielski wrote:
I'm using latest 2.0 version from trunk, rev. 49494 (because it supports 'encoding' keyword in HTMLParser). I'm parsing many HTML documents in loop, 100-200kB each. I have noticed that memory used by my program increases about 1MB after each document processed, so after a few hundreds of passes system is about to hang. Running the same code with lxml 1.3.6 doesn't cause such memory usage increase.
I'm using the following library calls: tree = etree.parse( <opened file>, HTMLParser(encoding=...)) etree.tostring(tree) el.xpath(...) getting children and attributes of elements
thanks for the report, I can reproduce this with a simple call to the parser. I'll look into it. Stefan
![](https://secure.gravatar.com/avatar/8b97b5aad24c30e4a1357b38cc39aeaa.jpg?s=120&d=mm&r=g)
Artur Siekielski wrote:
I'm using latest 2.0 version from trunk, rev. 49494 (because it supports 'encoding' keyword in HTMLParser). I'm parsing many HTML documents in loop, 100-200kB each. I have noticed that memory used by my program increases about 1MB after each document processed, so after a few hundreds of passes system is about to hang. Running the same code with lxml 1.3.6 doesn't cause such memory usage increase.
I'm using the following library calls: tree = etree.parse( <opened file>, HTMLParser(encoding=...)) etree.tostring(tree) el.xpath(...) getting children and attributes of elements
I'm using libxml2 version 2.6.28.
If anyone knows about solution/workaround, please write.
Hmmm, weird. The problem doesn't result from any change in lxml, just from the switch to Cython 0.9.6.8+. And I don't even see any obvious problem in the generated code. Anyway, here's a patch that seems to make the leak go away on my side. Could you give it a try? Stefan
![](https://secure.gravatar.com/avatar/8b97b5aad24c30e4a1357b38cc39aeaa.jpg?s=120&d=mm&r=g)
Stefan Behnel wrote:
Artur Siekielski wrote:
I'm using latest 2.0 version from trunk, rev. 49494 (because it supports 'encoding' keyword in HTMLParser). I'm parsing many HTML documents in loop, 100-200kB each. I have noticed that memory used by my program increases about 1MB after each document processed, so after a few hundreds of passes system is about to hang. Running the same code with lxml 1.3.6 doesn't cause such memory usage increase.
I'm using the following library calls: tree = etree.parse( <opened file>, HTMLParser(encoding=...)) etree.tostring(tree) el.xpath(...) getting children and attributes of elements
I'm using libxml2 version 2.6.28.
If anyone knows about solution/workaround, please write.
Hmmm, weird. The problem doesn't result from any change in lxml, just from the switch to Cython 0.9.6.8+. And I don't even see any obvious problem in the generated code.
I fixed the problem in Cython (and Pyrex). It should work with the next release. I attached the patch that I used in case you want to build lxml yourself using Cython. Stefan
participants (2)
-
Artur Siekielski
-
Stefan Behnel