[Python-Dev] Modify PyMem_Malloc to use pymalloc for performance

Victor Stinner victor.stinner at gmail.com
Fri Feb 12 10:07:21 EST 2016


Hi,

2016-02-12 14:31 GMT+01:00 M.-A. Lemburg <mal at egenix.com>:
> Sorry, your email must gotten lost in my inbox.

no problemo


> Yes, but those are part of the stdlib. You'd need to check
> a few C extensions which are not tested as part of the stdlib,
> e.g. numpy, scipy, lxml, pillow, etc. (esp. ones which implement custom
> types in C since these will often need the memory management
> APIs).
>
> It may also be a good idea to check wrapper generators such
> as cython, swig, cffi, etc.

Ok, I will try my patch on some of them. Thanks for the pointers.


> I suppose such a flag would create a noticeable runtime
> performance hit, since the compiler would no longer be
> able to inline the PyMem_*() APIs if you redirect those
> APIs to other sets at runtime.

Hum, I think that you missed the PEP 445. The overhead of this PEP was
discussed and considered as negligible enough to implement the PEP:
https://www.python.org/dev/peps/pep-0445/#performances

Using the PEP 445, there is no overhead to enable debug hooks at
runtime (except of the overhead of the debug checks themself ;-)).

PyMem_Malloc now calls a pointer:
https://hg.python.org/cpython/file/37bacf3fa1f5/Objects/obmalloc.c#l319

Same for PyObject_Malloc:
https://hg.python.org/cpython/file/37bacf3fa1f5/Objects/obmalloc.c#l380


> I also don't see much point in carrying around such
> baggage in production builds of Python, since you'd most
> likely only want to use the tools to debug C extensions during
> their development.

I propose adding an environment variable because it's rare that a
debug build is installed on system. Usually, using a debug build
requires to recompile all C extensions which is not really...
convenient...

With such env var, it would be trivial to check quickly if the Python
memory allocators are used correctly.


> Runtime performance, difference in memory consumption (arenas
> cannot be freed if there are still small chunks allocated),
> memory locality. I'm no expert in this, so can't really
> comment much.

"arenas cannot be freed if there are still small chunks allocated"
yeah, this is called memory fragmentation.

There is a big difference between libc malloc() and pymalloc for small
allocations: pymalloc is able to free an arena using munmap() which
releases immediatly the memory to the system, whereas most
implementation of malloc() use a single contigious memory block which
is only shrinked when all memory "at the top" is free. So it's the
same fragmentation issue that you described, except that it uses a
single arena which has an arbitrary size (between 1 MB and 10 GB,
there is no limit), whereas pymalloc uses small arenas of 256 KB.

In short, I expect less fragmentation with pymalloc.

"memory locality": I have no idea on that. I guess that it can be seen
on benchmarks. pymalloc is designed for objects with short lifetime.

Victor


More information about the Python-Dev mailing list