[Python-Dev] Modify PyMem_Malloc to use pymalloc for performance

Wed Feb 3 16:03:41 EST 2016

Hi,

There is an old discussion about the performance of PyMem_Malloc()
memory allocator. CPython is stressing a lot memory allocators. Last
time I made statistics, it was for the PEP 454:
"For example, the Python test suites calls malloc() , realloc() or
free() 270,000 times per second in average."
https://www.python.org/dev/peps/pep-0454/#log-calls-to-the-memory-allocator

I proposed a simple change: modify PyMem_Malloc() to use the pymalloc
allocator which is faster for allocation smaller than 512 bytes, or
fallback to malloc() (which is the current internal allocator of
PyMem_Malloc()).

This tiny change makes Python up to 6% faster on some specific (macro)
benchmarks, and it doesn't seem to make Python slower on any
benchmark:
http://bugs.python.org/issue26249#msg259445

Do you see any drawback of using pymalloc for PyMem_Malloc()?

Does anyone recall the rationale to have two families to memory allocators?

FYI Python has 3 families since 3.4: PyMem, PyObject but also PyMem_Raw!
https://www.python.org/dev/peps/pep-0445/

--

Since pymalloc is only used for small memory allocations, I understand
that small objects will not more be allocated on the heap memory, but
only in pymalloc arenas which are allocated by mmap. The advantage of
arenas is that it's possible to "punch holes" in the memory when a
whole arena is freed, whereas the heap memory has the famous
"fragmentation" issue because the heap is a single contiguous memory
block.

The libc malloc() uses mmap() for allocations larger than a threshold
which is now dynamic, and initialized to 128 kB or 256 kB by default
(I don't recall exactly the default value).

Is there a risk of *higher* memory fragmentation if we start to use
pymalloc for PyMem_Malloc()? Does someone know how to test it?

Victor