As said, my approach to debugging this kind of thing is to get out of
python ASAP. And once you manage to reproduce the result when calling
only a couple of python functions, then you use massif or memcheck.
I agree with you, the problem is that I do not use directly C functions and that I do not know how I can reproduce the result with a minimal example.
I know I've had once a Numpy problem that was solved by recompiling Numpy completely, but here, I do not possess a first clue to help me track this bug.
The reason to get out of python is that tools like valgrind can only
give you meaningful informations at the C level, and it is difficult to
make the link between C and python calls, if not impossible in all but
trivial cases.
But when you can get this kind of graphs:
http://valgrind.org/docs/manual/ms-manual.html
Then the problem is "solved", at least in my limited experience.