2016-02-12 14:31 GMT+01:00 M.-A. Lemburg email@example.com:
Sorry, your email must gotten lost in my inbox.
Yes, but those are part of the stdlib. You'd need to check a few C extensions which are not tested as part of the stdlib, e.g. numpy, scipy, lxml, pillow, etc. (esp. ones which implement custom types in C since these will often need the memory management APIs).
It may also be a good idea to check wrapper generators such as cython, swig, cffi, etc.
Ok, I will try my patch on some of them. Thanks for the pointers.
I suppose such a flag would create a noticeable runtime performance hit, since the compiler would no longer be able to inline the PyMem_*() APIs if you redirect those APIs to other sets at runtime.
Hum, I think that you missed the PEP 445. The overhead of this PEP was discussed and considered as negligible enough to implement the PEP: https://www.python.org/dev/peps/pep-0445/#performances
Using the PEP 445, there is no overhead to enable debug hooks at runtime (except of the overhead of the debug checks themself ;-)).
PyMem_Malloc now calls a pointer: https://hg.python.org/cpython/file/37bacf3fa1f5/Objects/obmalloc.c#l319
Same for PyObject_Malloc: https://hg.python.org/cpython/file/37bacf3fa1f5/Objects/obmalloc.c#l380
I also don't see much point in carrying around such baggage in production builds of Python, since you'd most likely only want to use the tools to debug C extensions during their development.
I propose adding an environment variable because it's rare that a debug build is installed on system. Usually, using a debug build requires to recompile all C extensions which is not really... convenient...
With such env var, it would be trivial to check quickly if the Python memory allocators are used correctly.
Runtime performance, difference in memory consumption (arenas cannot be freed if there are still small chunks allocated), memory locality. I'm no expert in this, so can't really comment much.
"arenas cannot be freed if there are still small chunks allocated" yeah, this is called memory fragmentation.
There is a big difference between libc malloc() and pymalloc for small allocations: pymalloc is able to free an arena using munmap() which releases immediatly the memory to the system, whereas most implementation of malloc() use a single contigious memory block which is only shrinked when all memory "at the top" is free. So it's the same fragmentation issue that you described, except that it uses a single arena which has an arbitrary size (between 1 MB and 10 GB, there is no limit), whereas pymalloc uses small arenas of 256 KB.
In short, I expect less fragmentation with pymalloc.
"memory locality": I have no idea on that. I guess that it can be seen on benchmarks. pymalloc is designed for objects with short lifetime.