Ed Singleton, 06.11.2013 19:49:
On 6 Nov 2013, at 18:25, Stefan Behnel wrote:
Stefan Behnel, 06.11.2013 18:36:
Brian Bird, 06.11.2013 16:01:
Could anyone confirm if this is really a bug?
Definitely. Thanks for investigating it, and sorry for not responding earlier. Your short test code makes this very easy to reproduce, and Amaury's hint at the XPath parser dict should make it easy to track down the problem in the code.
https://github.com/lxml/lxml/commit/f7d2682a511253445c128137f205bfb4d6973cbb
It turned out that the parser dict setup risked using the wrong dict anyway, so the safest fix was to not use a dictionary at all.
Would this be likely to cause a memory leak in non-threaded code?
I don't think so. Single-threaded code should always use the same (global) dictionary, so it can't leak memory more than once. The leak here seemed to be a couple of bytes per thread, that's so tiny that you wouldn't even notice it with only one thread.
I've been investigating a memory leak in an application that makes very heavy use of lxml, and I would be delighted if this was the cause.
As you've seen, the best way to get it fixed is to invest the time to strip it down to just a couple of operations. That's work, sure, but someone has to invest it. You should also make sure that you are using the latest libxml2/libxslt. And you could run your program through valgrind, which can detect memory that doesn't get freed. The Makefile in lxml's sources has an example command line. Stefan