There is an old discussion about the performance of PyMem_Malloc() memory allocator. CPython is stressing a lot memory allocators. Last time I made statistics, it was for the PEP 454: "For example, the Python test suites calls malloc() , realloc() or free() 270,000 times per second in average." https://www.python.org/dev/peps/pep-0454/#log-calls-to-the-memory-allocator
I proposed a simple change: modify PyMem_Malloc() to use the pymalloc allocator which is faster for allocation smaller than 512 bytes, or fallback to malloc() (which is the current internal allocator of PyMem_Malloc()).
This tiny change makes Python up to 6% faster on some specific (macro) benchmarks, and it doesn't seem to make Python slower on any benchmark: http://bugs.python.org/issue26249#msg259445
Do you see any drawback of using pymalloc for PyMem_Malloc()?
Does anyone recall the rationale to have two families to memory allocators?
FYI Python has 3 families since 3.4: PyMem, PyObject but also PyMem_Raw! https://www.python.org/dev/peps/pep-0445/
Since pymalloc is only used for small memory allocations, I understand that small objects will not more be allocated on the heap memory, but only in pymalloc arenas which are allocated by mmap. The advantage of arenas is that it's possible to "punch holes" in the memory when a whole arena is freed, whereas the heap memory has the famous "fragmentation" issue because the heap is a single contiguous memory block.
The libc malloc() uses mmap() for allocations larger than a threshold which is now dynamic, and initialized to 128 kB or 256 kB by default (I don't recall exactly the default value).
Is there a risk of *higher* memory fragmentation if we start to use pymalloc for PyMem_Malloc()? Does someone know how to test it?