Modify PyMem_Malloc to use pymalloc for performance

Hi, There is an old discussion about the performance of PyMem_Malloc() memory allocator. CPython is stressing a lot memory allocators. Last time I made statistics, it was for the PEP 454: "For example, the Python test suites calls malloc() , realloc() or free() 270,000 times per second in average." https://www.python.org/dev/peps/pep-0454/#log-calls-to-the-memory-allocator I proposed a simple change: modify PyMem_Malloc() to use the pymalloc allocator which is faster for allocation smaller than 512 bytes, or fallback to malloc() (which is the current internal allocator of PyMem_Malloc()). This tiny change makes Python up to 6% faster on some specific (macro) benchmarks, and it doesn't seem to make Python slower on any benchmark: http://bugs.python.org/issue26249#msg259445 Do you see any drawback of using pymalloc for PyMem_Malloc()? Does anyone recall the rationale to have two families to memory allocators? FYI Python has 3 families since 3.4: PyMem, PyObject but also PyMem_Raw! https://www.python.org/dev/peps/pep-0445/ -- Since pymalloc is only used for small memory allocations, I understand that small objects will not more be allocated on the heap memory, but only in pymalloc arenas which are allocated by mmap. The advantage of arenas is that it's possible to "punch holes" in the memory when a whole arena is freed, whereas the heap memory has the famous "fragmentation" issue because the heap is a single contiguous memory block. The libc malloc() uses mmap() for allocations larger than a threshold which is now dynamic, and initialized to 128 kB or 256 kB by default (I don't recall exactly the default value). Is there a risk of *higher* memory fragmentation if we start to use pymalloc for PyMem_Malloc()? Does someone know how to test it? Victor

There is an old discussion about the performance of PyMem_Malloc() memory allocator.
Oops, I forgot to mention that my patch is a follow-up of a previous patch showing nice speedup on dict: http://bugs.python.org/issue23601 (but I said it in my issue ;-)) Well, see http://bugs.python.org/issue26249 for the longer context. 2016-02-03 22:03 GMT+01:00 Victor Stinner <victor.stinner@gmail.com>:
Does anyone recall the rationale to have two families to memory allocators?
I asked Mercurial, and I found the change addind PyMem_Malloc(): --- branch: legacy-trunk user: Guido van Rossum <guido@python.org> date: Tue Aug 05 01:59:22 1997 +0000 files: Include/mymalloc.h description: Added Py_Malloc and friends as well as PyMem_Malloc and friends. --- As expected, it's old, as the change adding PyObject_Malloc(): --- changeset: 12576:1c7c2dd1beb1 branch: legacy-trunk user: Guido van Rossum <guido@python.org> date: Wed May 03 23:44:39 2000 +0000 files: Include/mymalloc.h Include/objimpl.h Modules/_cursesmodule.c Modules/_sre.c Modules/_tkinter.c Modules/almodule.c Modules/arraymodule.c Modules/bsddbmodule. description: Vladimir Marangozov's long-awaited malloc restructuring. For more comments, read the patches@python.org archives. For documentation read the comments in mymalloc.h and objimpl.h. (This is not exactly what Vladimir posted to the patches list; I've made a few changes, and Vladimir sent me a fix in private email for a problem that only occurs in debug mode. I'm also holding back on his change to main.c, which seems unnecessary to me.) --- Victor

On 03.02.2016 22:03, Victor Stinner wrote:
Hi,
There is an old discussion about the performance of PyMem_Malloc() memory allocator. CPython is stressing a lot memory allocators. Last time I made statistics, it was for the PEP 454: "For example, the Python test suites calls malloc() , realloc() or free() 270,000 times per second in average." https://www.python.org/dev/peps/pep-0454/#log-calls-to-the-memory-allocator
I proposed a simple change: modify PyMem_Malloc() to use the pymalloc allocator which is faster for allocation smaller than 512 bytes, or fallback to malloc() (which is the current internal allocator of PyMem_Malloc()).
This tiny change makes Python up to 6% faster on some specific (macro) benchmarks, and it doesn't seem to make Python slower on any benchmark: http://bugs.python.org/issue26249#msg259445
Do you see any drawback of using pymalloc for PyMem_Malloc()?
Yes: You cannot free memory allocated using pymalloc with the standard C lib free(). It would be better to go through the list of PyMem_*() calls in Python and replace them with PyObject_*() calls, where possible.
Does anyone recall the rationale to have two families to memory allocators?
The PyMem_*() APIs were needed to have a cross-platform malloc() implementation which returns standard C lib free()able memory, but also behaves well when passing 0 as size. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Feb 04 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

Hi, 2016-02-04 11:17 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
Do you see any drawback of using pymalloc for PyMem_Malloc()?
Yes: You cannot free memory allocated using pymalloc with the standard C lib free().
That's not completly new. If Python is compiled in debug mode, you get a fatal error with a huge error message if you free the memory allocated by PyMem_Malloc() using PyObject_Free() or PyMem_RawFree(). But yes, technically it's possible to use free() when Python is *not* compiled in debug mode.
It would be better to go through the list of PyMem_*() calls in Python and replace them with PyObject_*() calls, where possible.
There are 536 calls to the functions PyMem_Malloc(), PyMem_Realloc() and PyMem_Free(). I would prefer to modify a single place having to replace 536 calls :-/
Does anyone recall the rationale to have two families to memory allocators?
The PyMem_*() APIs were needed to have a cross-platform malloc() implementation which returns standard C lib free()able memory, but also behaves well when passing 0 as size.
Yeah, PyMem_Malloc() & PyMem_Free() help to have a portable behaviour. But, why not PyObject_Malloc() & PObject_Free() were not used in the first place? An explanation can be that PyMem_Malloc() can be called without the GIL held. But it wasn't true before Python 3.4, since PyMem_Malloc() called (indirectly) PyObject_Malloc() when Python was compiled in debug mode, and PyObject_Malloc() requires the GIL to be held. When I wrote the PEP 445, there was a discussion about the GIL. It was proposed to allow to call PyMem_xxx() without the GIL: https://www.python.org/dev/peps/pep-0445/#gil-free-pymem-malloc This option was rejected. Victor

On 04.02.2016 13:29, Victor Stinner wrote:
Hi,
2016-02-04 11:17 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
Do you see any drawback of using pymalloc for PyMem_Malloc()?
Yes: You cannot free memory allocated using pymalloc with the standard C lib free().
That's not completly new.
If Python is compiled in debug mode, you get a fatal error with a huge error message if you free the memory allocated by PyMem_Malloc() using PyObject_Free() or PyMem_RawFree().
But yes, technically it's possible to use free() when Python is *not* compiled in debug mode.
Debug mode is a completely different beast ;-)
It would be better to go through the list of PyMem_*() calls in Python and replace them with PyObject_*() calls, where possible.
There are 536 calls to the functions PyMem_Malloc(), PyMem_Realloc() and PyMem_Free().
I would prefer to modify a single place having to replace 536 calls :-/
You have a point there, but I don't think it'll work out that easily, since we are using such calls to e.g. pass dynamically allocated buffers to code in extensions (which then have to free the buffers again).
Does anyone recall the rationale to have two families to memory allocators?
The PyMem_*() APIs were needed to have a cross-platform malloc() implementation which returns standard C lib free()able memory, but also behaves well when passing 0 as size.
Yeah, PyMem_Malloc() & PyMem_Free() help to have a portable behaviour. But, why not PyObject_Malloc() & PObject_Free() were not used in the first place?
Good question. I guess developers simply thought of PyObject_Malloc() being for PyObjects, not arbitrary memory buffers, most likely because pymalloc was advertised as allocator for Python objects, not random chunks of memory. Also: PyObject_*() APIs were first introduced with pymalloc, and no one really was interested in going through all the calls to PyMem_*() APIs and convert those to use the new pymalloc at the time. All this happened between Python 1.5.2 and 2.0. One of the reasons probably also was that pymalloc originally did not return memory back to the system malloc(). This was changed only some years ago.
An explanation can be that PyMem_Malloc() can be called without the GIL held. But it wasn't true before Python 3.4, since PyMem_Malloc() called (indirectly) PyObject_Malloc() when Python was compiled in debug mode, and PyObject_Malloc() requires the GIL to be held.
When I wrote the PEP 445, there was a discussion about the GIL. It was proposed to allow to call PyMem_xxx() without the GIL: https://www.python.org/dev/peps/pep-0445/#gil-free-pymem-malloc
This option was rejected.
AFAIR, the GIL was not really part of the consideration at the time. We used pymalloc for PyObject allocation, that's all. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Feb 04 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

Thanks for your feedback, you are asking good questions :-) 2016-02-04 13:54 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
There are 536 calls to the functions PyMem_Malloc(), PyMem_Realloc() and PyMem_Free().
I would prefer to modify a single place having to replace 536 calls :-/
You have a point there, but I don't think it'll work out that easily, since we are using such calls to e.g. pass dynamically allocated buffers to code in extensions (which then have to free the buffers again).
Ah, interesting. But I'm not sure that we delegate the responsability of freeing the memory to external libraries. Usually, it's more the opposite: a library gives us an allocated memory block, and we have to free it. No? I checked if we call directly malloc() to pass the buffer to a library, but I failed to find such case. Again, in debug mode, calling free() on a memory block allocated by PyMem_Malloc() will likely crash. Since we run the Python test suite with a Python compiled in debug mode, we would already have detected such bug, no? See also my old issue http://bugs.python.org/issue18203 which replaced almost all direct calls to malloc() with PyMem_Malloc() or PyMem_RawMalloc().
Good question. I guess developers simply thought of PyObject_Malloc() being for PyObjects,
Yeah, I also understood that, but in practice, it looks like PyMem_Malloc() is slower than so using it makes the code less efficient than it can be. Instead of teaching developers that well, in fact, PyObject_Malloc() is unrelated to object programming, I think that it's simpler to modify PyMem_Malloc() to reuse pymalloc ;-) Victor

On 04.02.2016 14:25, Victor Stinner wrote:
Thanks for your feedback, you are asking good questions :-)
2016-02-04 13:54 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
There are 536 calls to the functions PyMem_Malloc(), PyMem_Realloc() and PyMem_Free().
I would prefer to modify a single place having to replace 536 calls :-/
You have a point there, but I don't think it'll work out that easily, since we are using such calls to e.g. pass dynamically allocated buffers to code in extensions (which then have to free the buffers again).
Ah, interesting. But I'm not sure that we delegate the responsability of freeing the memory to external libraries. Usually, it's more the opposite: a library gives us an allocated memory block, and we have to free it. No?
Sometimes, yes, but we also do allocations for e.g. parsing values in Python argument tuples (e.g. using "es" or "et"): https://docs.python.org/3.6/c-api/arg.html We do document to use PyMem_Free() on those; not sure whether everyone does this though.
I checked if we call directly malloc() to pass the buffer to a library, but I failed to find such case.
Again, in debug mode, calling free() on a memory block allocated by PyMem_Malloc() will likely crash. Since we run the Python test suite with a Python compiled in debug mode, we would already have detected such bug, no?
The Python test suite doesn't test Python C extensions, so it's not surprising that it passes :-)
See also my old issue http://bugs.python.org/issue18203 which replaced almost all direct calls to malloc() with PyMem_Malloc() or PyMem_RawMalloc().
Good question. I guess developers simply thought of PyObject_Malloc() being for PyObjects,
Yeah, I also understood that, but in practice, it looks like PyMem_Malloc() is slower than so using it makes the code less efficient than it can be.
Instead of teaching developers that well, in fact, PyObject_Malloc() is unrelated to object programming, I think that it's simpler to modify PyMem_Malloc() to reuse pymalloc ;-)
Perhaps if you add some guards somewhere :-) Seriously, this may work if C extensions use the APIs consistently, but in order to tell, we'd need to check few. I know that I switched over all mx Extensions to use PyObject_*() instead of PyMem_*() or native malloc() several years ago and have not run into any issues. I guess the main question then is whether pymalloc is good enough for general memory allocation needs; and the answer may well be "yes". BTW: Tuning pymalloc for commonly used object sizes is another area where Python could gain better performance, i.e. reserve more / pre-allocate space for often used block sizes. pymalloc will also only work well for small blocks (up to 512 bytes). Everything else is routed to the system malloc(). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Feb 04 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

2016-02-04 15:05 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
Sometimes, yes, but we also do allocations for e.g. parsing values in Python argument tuples (e.g. using "es" or "et"):
https://docs.python.org/3.6/c-api/arg.html
We do document to use PyMem_Free() on those; not sure whether everyone does this though.
It's well documented. If programs start to crash, they must be fixed. I don't propose to "break the API" for free, but to get a speedup on the overall Python. And I don't think that we can say that it's an API change, since we already stated that PyMem_Free() must be used. If your program has bugs, you can use a debug build of Python 3.5 to detect misusage of the API.
The Python test suite doesn't test Python C extensions, so it's not surprising that it passes :-)
What do you mean by "C extensions"? Which modules? Many modules in the stdlib have "C accelerators" and the PEP 399 now *require* to test the C and Python implementations.
Instead of teaching developers that well, in fact, PyObject_Malloc() is unrelated to object programming, I think that it's simpler to modify PyMem_Malloc() to reuse pymalloc ;-)
Perhaps if you add some guards somewhere :-)
We have runtime checks but only implemented in debug mode for efficiency. By the way, I proposed once to add an environment variable to allow to enable these checks without having to recompile Python. Since the PEP 445, it became easy to implement this. What do you think? https://www.python.org/dev/peps/pep-0445/#add-a-new-pydebugmalloc-environmen... "This alternative was rejected because a new environment variable would make Python initialization even more complex. PEP 432 tries to simplify the CPython startup sequence." The PEP 432 looks stuck, so I don't think that we should block enhancements because of this PEP. Anyway, my idea should be easy to implement.
Seriously, this may work if C extensions use the APIs consistently, but in order to tell, we'd need to check few.
Can you suggest me names of projects that must be tested?
I guess the main question then is whether pymalloc is good enough for general memory allocation needs; and the answer may well be "yes".
What do you mean by "good enough"? For the runtime performance, pymalloc looks to be faster than malloc(). What are your other criterias? Memory fragmentation? Victor

ping? 2016-02-08 15:18 GMT+01:00 Victor Stinner <victor.stinner@gmail.com>:
2016-02-04 15:05 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
Sometimes, yes, but we also do allocations for e.g. parsing values in Python argument tuples (e.g. using "es" or "et"):
https://docs.python.org/3.6/c-api/arg.html
We do document to use PyMem_Free() on those; not sure whether everyone does this though.
It's well documented. If programs start to crash, they must be fixed.
I don't propose to "break the API" for free, but to get a speedup on the overall Python.
And I don't think that we can say that it's an API change, since we already stated that PyMem_Free() must be used.
If your program has bugs, you can use a debug build of Python 3.5 to detect misusage of the API.
The Python test suite doesn't test Python C extensions, so it's not surprising that it passes :-)
What do you mean by "C extensions"? Which modules?
Many modules in the stdlib have "C accelerators" and the PEP 399 now *require* to test the C and Python implementations.
Instead of teaching developers that well, in fact, PyObject_Malloc() is unrelated to object programming, I think that it's simpler to modify PyMem_Malloc() to reuse pymalloc ;-)
Perhaps if you add some guards somewhere :-)
We have runtime checks but only implemented in debug mode for efficiency.
By the way, I proposed once to add an environment variable to allow to enable these checks without having to recompile Python. Since the PEP 445, it became easy to implement this. What do you think? https://www.python.org/dev/peps/pep-0445/#add-a-new-pydebugmalloc-environmen...
"This alternative was rejected because a new environment variable would make Python initialization even more complex. PEP 432 tries to simplify the CPython startup sequence."
The PEP 432 looks stuck, so I don't think that we should block enhancements because of this PEP. Anyway, my idea should be easy to implement.
Seriously, this may work if C extensions use the APIs consistently, but in order to tell, we'd need to check few.
Can you suggest me names of projects that must be tested?
I guess the main question then is whether pymalloc is good enough for general memory allocation needs; and the answer may well be "yes".
What do you mean by "good enough"? For the runtime performance, pymalloc looks to be faster than malloc(). What are your other criterias? Memory fragmentation?
Victor

On 12.02.2016 12:18, Victor Stinner wrote:
ping?
Sorry, your email must gotten lost in my inbox.
2016-02-08 15:18 GMT+01:00 Victor Stinner <victor.stinner@gmail.com>:
2016-02-04 15:05 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
Sometimes, yes, but we also do allocations for e.g. parsing values in Python argument tuples (e.g. using "es" or "et"):
https://docs.python.org/3.6/c-api/arg.html
We do document to use PyMem_Free() on those; not sure whether everyone does this though.
It's well documented. If programs start to crash, they must be fixed.
I don't propose to "break the API" for free, but to get a speedup on the overall Python.
And I don't think that we can say that it's an API change, since we already stated that PyMem_Free() must be used.
If your program has bugs, you can use a debug build of Python 3.5 to detect misusage of the API.
Yes, but people don't necessarily do this, e.g. I have for a very long time ignored debug builds completely and when I started to try them, I found that some of the things I had been doing with e.g. free list implementations did not work in debug builds.
The Python test suite doesn't test Python C extensions, so it's not surprising that it passes :-)
What do you mean by "C extensions"? Which modules?
Many modules in the stdlib have "C accelerators" and the PEP 399 now *require* to test the C and Python implementations.
Yes, but those are part of the stdlib. You'd need to check a few C extensions which are not tested as part of the stdlib, e.g. numpy, scipy, lxml, pillow, etc. (esp. ones which implement custom types in C since these will often need the memory management APIs). It may also be a good idea to check wrapper generators such as cython, swig, cffi, etc.
Instead of teaching developers that well, in fact, PyObject_Malloc() is unrelated to object programming, I think that it's simpler to modify PyMem_Malloc() to reuse pymalloc ;-)
Perhaps if you add some guards somewhere :-)
We have runtime checks but only implemented in debug mode for efficiency.
By the way, I proposed once to add an environment variable to allow to enable these checks without having to recompile Python. Since the PEP 445, it became easy to implement this. What do you think? https://www.python.org/dev/peps/pep-0445/#add-a-new-pydebugmalloc-environmen...
"This alternative was rejected because a new environment variable would make Python initialization even more complex. PEP 432 tries to simplify the CPython startup sequence."
The PEP 432 looks stuck, so I don't think that we should block enhancements because of this PEP. Anyway, my idea should be easy to implement.
I suppose such a flag would create a noticeable runtime performance hit, since the compiler would no longer be able to inline the PyMem_*() APIs if you redirect those APIs to other sets at runtime. I also don't see much point in carrying around such baggage in production builds of Python, since you'd most likely only want to use the tools to debug C extensions during their development.
Seriously, this may work if C extensions use the APIs consistently, but in order to tell, we'd need to check few.
Can you suggest me names of projects that must be tested?
See above for a list of starters :-) It would be good to add a few more that work on text or larger chunks of memory, since those will most likely utilize the memory allocators more than other extensions which mostly wrap (sets of) C variables. Some of them may also have benchmarks, so in addition to checking whether they work with the change, you could also test performance.
I guess the main question then is whether pymalloc is good enough for general memory allocation needs; and the answer may well be "yes".
What do you mean by "good enough"? For the runtime performance, pymalloc looks to be faster than malloc(). What are your other criterias? Memory fragmentation?
Runtime performance, difference in memory consumption (arenas cannot be freed if there are still small chunks allocated), memory locality. I'm no expert in this, so can't really comment much. I suspect that lib C and OS provided allocators will have advantages as well, but since pymalloc redirects to them for all larger memory chunks, it's probably an overall win for Python C extensions (and Python itself). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Feb 12 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
2016-01-19: Released eGenix pyOpenSSL 0.13.13 ... http://egenix.com/go86 ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

Hi, 2016-02-12 14:31 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
Sorry, your email must gotten lost in my inbox.
no problemo
Yes, but those are part of the stdlib. You'd need to check a few C extensions which are not tested as part of the stdlib, e.g. numpy, scipy, lxml, pillow, etc. (esp. ones which implement custom types in C since these will often need the memory management APIs).
It may also be a good idea to check wrapper generators such as cython, swig, cffi, etc.
Ok, I will try my patch on some of them. Thanks for the pointers.
I suppose such a flag would create a noticeable runtime performance hit, since the compiler would no longer be able to inline the PyMem_*() APIs if you redirect those APIs to other sets at runtime.
Hum, I think that you missed the PEP 445. The overhead of this PEP was discussed and considered as negligible enough to implement the PEP: https://www.python.org/dev/peps/pep-0445/#performances Using the PEP 445, there is no overhead to enable debug hooks at runtime (except of the overhead of the debug checks themself ;-)). PyMem_Malloc now calls a pointer: https://hg.python.org/cpython/file/37bacf3fa1f5/Objects/obmalloc.c#l319 Same for PyObject_Malloc: https://hg.python.org/cpython/file/37bacf3fa1f5/Objects/obmalloc.c#l380
I also don't see much point in carrying around such baggage in production builds of Python, since you'd most likely only want to use the tools to debug C extensions during their development.
I propose adding an environment variable because it's rare that a debug build is installed on system. Usually, using a debug build requires to recompile all C extensions which is not really... convenient... With such env var, it would be trivial to check quickly if the Python memory allocators are used correctly.
Runtime performance, difference in memory consumption (arenas cannot be freed if there are still small chunks allocated), memory locality. I'm no expert in this, so can't really comment much.
"arenas cannot be freed if there are still small chunks allocated" yeah, this is called memory fragmentation. There is a big difference between libc malloc() and pymalloc for small allocations: pymalloc is able to free an arena using munmap() which releases immediatly the memory to the system, whereas most implementation of malloc() use a single contigious memory block which is only shrinked when all memory "at the top" is free. So it's the same fragmentation issue that you described, except that it uses a single arena which has an arbitrary size (between 1 MB and 10 GB, there is no limit), whereas pymalloc uses small arenas of 256 KB. In short, I expect less fragmentation with pymalloc. "memory locality": I have no idea on that. I guess that it can be seen on benchmarks. pymalloc is designed for objects with short lifetime. Victor

2016-02-12 14:31 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
If your program has bugs, you can use a debug build of Python 3.5 to detect misusage of the API.
Yes, but people don't necessarily do this, e.g. I have for a very long time ignored debug builds completely and when I started to try them, I found that some of the things I had been doing with e.g. free list implementations did not work in debug builds.
I just added support for debug hooks on Python memory allocators on Python compiled in *release* mode. Set the environment variable PYTHONMALLOC to debug to try with Python 3.6. I added a check on PyObject_Malloc() debug hook to ensure that the function is called with the GIL held. I opened an issue to add a similar check on PyMem_Malloc(): https://bugs.python.org/issue26563
Yes, but those are part of the stdlib. You'd need to check a few C extensions which are not tested as part of the stdlib, e.g. numpy, scipy, lxml, pillow, etc. (esp. ones which implement custom types in C since these will often need the memory management APIs).
It may also be a good idea to check wrapper generators such as cython, swig, cffi, etc.
I ran the test suite of numpy, lxml, Pillow and cryptography (used cffi). I found a bug in numpy. numpy calls PyMem_Malloc() without holding the GIL: https://github.com/numpy/numpy/pull/7404 Except of this bug, all other tests pass with PyMem_Malloc() using pymalloc and all debug checks. Victor

So what do you think? Is it worth to change PyMem_Malloc() allocator to pymalloc for a small speedup? Should we do something else before doing that? Or do you expect that too many applications use PyMem_Malloc() without holding the GIL and will not run try to run their application with PYTHONMALLOC=debug? Victor 2016-03-15 0:19 GMT+01:00 Victor Stinner <victor.stinner@gmail.com>:
2016-02-12 14:31 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
If your program has bugs, you can use a debug build of Python 3.5 to detect misusage of the API.
Yes, but people don't necessarily do this, e.g. I have for a very long time ignored debug builds completely and when I started to try them, I found that some of the things I had been doing with e.g. free list implementations did not work in debug builds.
I just added support for debug hooks on Python memory allocators on Python compiled in *release* mode. Set the environment variable PYTHONMALLOC to debug to try with Python 3.6.
I added a check on PyObject_Malloc() debug hook to ensure that the function is called with the GIL held. I opened an issue to add a similar check on PyMem_Malloc(): https://bugs.python.org/issue26563
Yes, but those are part of the stdlib. You'd need to check a few C extensions which are not tested as part of the stdlib, e.g. numpy, scipy, lxml, pillow, etc. (esp. ones which implement custom types in C since these will often need the memory management APIs).
It may also be a good idea to check wrapper generators such as cython, swig, cffi, etc.
I ran the test suite of numpy, lxml, Pillow and cryptography (used cffi).
I found a bug in numpy. numpy calls PyMem_Malloc() without holding the GIL: https://github.com/numpy/numpy/pull/7404
Except of this bug, all other tests pass with PyMem_Malloc() using pymalloc and all debug checks.
Victor

Ping? Is someone still opposed to my change #26249 "Change PyMem_Malloc to use pymalloc allocator"? If no, I think that I will push my change. My change only changes two lines, so it can be easily reverted before CPython 3.6 if we detect major issues in third-party extensions. And maybe it's better to push such change today to get more time to play with it, than pushing it late in the development of CPython 3.6. The new PYTHONMALLOC=debug feature allows to quickly and easily check the usage of the PyMem_Malloc() API, even if Python is compiled in release mode. I checked multiple Python extensions written in C. I only found one bug in numpy and I sent a patch (not merged yet). victor 2016-03-15 0:19 GMT+01:00 Victor Stinner <victor.stinner@gmail.com>:
2016-02-12 14:31 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
If your program has bugs, you can use a debug build of Python 3.5 to detect misusage of the API.
Yes, but people don't necessarily do this, e.g. I have for a very long time ignored debug builds completely and when I started to try them, I found that some of the things I had been doing with e.g. free list implementations did not work in debug builds.
I just added support for debug hooks on Python memory allocators on Python compiled in *release* mode. Set the environment variable PYTHONMALLOC to debug to try with Python 3.6.
I added a check on PyObject_Malloc() debug hook to ensure that the function is called with the GIL held. I opened an issue to add a similar check on PyMem_Malloc(): https://bugs.python.org/issue26563
Yes, but those are part of the stdlib. You'd need to check a few C extensions which are not tested as part of the stdlib, e.g. numpy, scipy, lxml, pillow, etc. (esp. ones which implement custom types in C since these will often need the memory management APIs).
It may also be a good idea to check wrapper generators such as cython, swig, cffi, etc.
I ran the test suite of numpy, lxml, Pillow and cryptography (used cffi).
I found a bug in numpy. numpy calls PyMem_Malloc() without holding the GIL: https://github.com/numpy/numpy/pull/7404
Except of this bug, all other tests pass with PyMem_Malloc() using pymalloc and all debug checks.
Victor

Hi, My pull request has been merged into numpy. numpy now uses PyMem_RawMalloc() rather than PyMem_Malloc() since it uses the memory allocator without holding the GIL: https://github.com/numpy/numpy/pull/7404 It was proposed to modify numpy to hold the GIL. Maybe it will be done later. It means that there are no more C extensions known to not use correctly Python memory allocators. So I pushed my change in CPython to use the pymalloc memory allocator in PyMem_Malloc(): https://hg.python.org/cpython/rev/68b2a43d8653 I documented that porting C extensions to Python 3.6 require to run tests with PYTHONMALLOC=debug. This environment variable enables checks at runtime to validate the usage of Python memory allocators, including checks on the GIL. PYTHONMALLOC=debug and the check on the GIL are new in Python 3.6. By the way, I modified the code to log the fatal error. if a buffer overflow/underflow is detected in a free function like PyObject_Free() and tracemalloc is enabled, the traceback where the memory block was allocated is now displayed: https://docs.python.org/dev/whatsnew/3.6.html#pythonmalloc-environment-varia... Moreover, the warning logger now also log where file, socket, etc. were allocated on ResourceWarning: https://docs.python.org/dev/whatsnew/3.6.html#warnings It looks like Python 3.6 will help developers ;-) Victor 2016-04-20 1:33 GMT+02:00 Victor Stinner <victor.stinner@gmail.com>:
Ping? Is someone still opposed to my change #26249 "Change PyMem_Malloc to use pymalloc allocator"? If no, I think that I will push my change.
My change only changes two lines, so it can be easily reverted before CPython 3.6 if we detect major issues in third-party extensions. And maybe it's better to push such change today to get more time to play with it, than pushing it late in the development of CPython 3.6.
The new PYTHONMALLOC=debug feature allows to quickly and easily check the usage of the PyMem_Malloc() API, even if Python is compiled in release mode.
I checked multiple Python extensions written in C. I only found one bug in numpy and I sent a patch (not merged yet).
victor
2016-03-15 0:19 GMT+01:00 Victor Stinner <victor.stinner@gmail.com>:
2016-02-12 14:31 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
If your program has bugs, you can use a debug build of Python 3.5 to detect misusage of the API.
Yes, but people don't necessarily do this, e.g. I have for a very long time ignored debug builds completely and when I started to try them, I found that some of the things I had been doing with e.g. free list implementations did not work in debug builds.
I just added support for debug hooks on Python memory allocators on Python compiled in *release* mode. Set the environment variable PYTHONMALLOC to debug to try with Python 3.6.
I added a check on PyObject_Malloc() debug hook to ensure that the function is called with the GIL held. I opened an issue to add a similar check on PyMem_Malloc(): https://bugs.python.org/issue26563
Yes, but those are part of the stdlib. You'd need to check a few C extensions which are not tested as part of the stdlib, e.g. numpy, scipy, lxml, pillow, etc. (esp. ones which implement custom types in C since these will often need the memory management APIs).
It may also be a good idea to check wrapper generators such as cython, swig, cffi, etc.
I ran the test suite of numpy, lxml, Pillow and cryptography (used cffi).
I found a bug in numpy. numpy calls PyMem_Malloc() without holding the GIL: https://github.com/numpy/numpy/pull/7404
Except of this bug, all other tests pass with PyMem_Malloc() using pymalloc and all debug checks.
Victor

2016-02-08 15:18 GMT+01:00 Victor Stinner <victor.stinner@gmail.com>:
Perhaps if you add some guards somewhere :-)
We have runtime checks but only implemented in debug mode for efficiency.
By the way, I proposed once to add an environment variable to allow to enable these checks without having to recompile Python. Since the PEP 445, it became easy to implement this. What do you think? https://www.python.org/dev/peps/pep-0445/#add-a-new-pydebugmalloc-environmen...
Ok, I wrote a patch to implement a new PYTHONMALLOC environment variable: http://bugs.python.org/issue26516 PYTHONMALLOC=debug installs debug hooks to: * detect API violations, ex: PyObject_Free() called on a buffer allocated by PyMem_Malloc() * detect write before the start of the buffer (buffer underflow) * detect write after the end of the buffer (buffer overflow) https://docs.python.org/dev/c-api/memory.html#c.PyMem_SetupDebugHooks The main advantage of this variable is that you don't have to recompile Python in debug mode to benefit of these checks. Recompiling Python in debug mode requires to recompile *all* extensions modules since the debug ABI is incompatible. When I played with tracemalloc on Python 2 ( http://pytracemalloc.readthedocs.org/ ), I had such issues, it was very annoying with non-trivial extension modules like PyQt or PyGTK. With PYTHONMALLOC, you don't have to recompile extension modules anymore! With tracemalloc and PYTHONMALLOC=debug, we will have a complete tool suite to "debug memory"! My motivation for PYTHONMALLOC=debug is to detect API violations to prepare my change on PyMem_Malloc() allocator ( http://bugs.python.org/issue26249 ), but also to help users to detect bugs. It's common that users report a bug: "Python crashed", but have no idea of the responsible of the crash. I hope that detection of buffer underflow & overflow will help them to detect bugs in their own extension modules. Moreover, I added PYTHONMALLOC=malloc to ease the use of external memory debugger on Python. By default, Python uses pymalloc allocator for PyObject_Malloc() which raises a lot of false positive in Valgrind. We even have a configuration (--with-valgrind) and a Valgrind suppressino file to be able to skip these false alarms in Valgrind. IMHO PYTHONMALLOC=malloc is a simpler option to use Valgrind (or other tools). Victor

On Wed, 9 Mar 2016 at 06:57 Victor Stinner <victor.stinner@gmail.com> wrote:
2016-02-08 15:18 GMT+01:00 Victor Stinner <victor.stinner@gmail.com>:
Perhaps if you add some guards somewhere :-)
We have runtime checks but only implemented in debug mode for efficiency.
By the way, I proposed once to add an environment variable to allow to enable these checks without having to recompile Python. Since the PEP 445, it became easy to implement this. What do you think?
https://www.python.org/dev/peps/pep-0445/#add-a-new-pydebugmalloc-environmen...
Ok, I wrote a patch to implement a new PYTHONMALLOC environment variable:
http://bugs.python.org/issue26516
PYTHONMALLOC=debug installs debug hooks to:
* detect API violations, ex: PyObject_Free() called on a buffer allocated by PyMem_Malloc() * detect write before the start of the buffer (buffer underflow) * detect write after the end of the buffer (buffer overflow)
https://docs.python.org/dev/c-api/memory.html#c.PyMem_SetupDebugHooks
The main advantage of this variable is that you don't have to recompile Python in debug mode to benefit of these checks.
I just wanted to say this all sounds awesome! Thanks for all the hard work on making our memory management story easier to work with, Victor. -Brett
Recompiling Python in debug mode requires to recompile *all* extensions modules since the debug ABI is incompatible. When I played with tracemalloc on Python 2 ( http://pytracemalloc.readthedocs.org/ ), I had such issues, it was very annoying with non-trivial extension modules like PyQt or PyGTK. With PYTHONMALLOC, you don't have to recompile extension modules anymore!
With tracemalloc and PYTHONMALLOC=debug, we will have a complete tool suite to "debug memory"!
My motivation for PYTHONMALLOC=debug is to detect API violations to prepare my change on PyMem_Malloc() allocator ( http://bugs.python.org/issue26249 ), but also to help users to detect bugs.
It's common that users report a bug: "Python crashed", but have no idea of the responsible of the crash. I hope that detection of buffer underflow & overflow will help them to detect bugs in their own extension modules.
Moreover, I added PYTHONMALLOC=malloc to ease the use of external memory debugger on Python. By default, Python uses pymalloc allocator for PyObject_Malloc() which raises a lot of false positive in Valgrind. We even have a configuration (--with-valgrind) and a Valgrind suppressino file to be able to skip these false alarms in Valgrind. IMHO PYTHONMALLOC=malloc is a simpler option to use Valgrind (or other tools).
Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org

2016-03-09 18:54 GMT+01:00 Brett Cannon <brett@python.org>:
https://docs.python.org/dev/c-api/memory.html#c.PyMem_SetupDebugHooks
The main advantage of this variable is that you don't have to recompile Python in debug mode to benefit of these checks.
I just wanted to say this all sounds awesome! Thanks for all the hard work on making our memory management story easier to work with, Victor.
You're welcome. I pushed my patch adding PYTHONMALLOC environment variable: https://docs.python.org/dev/whatsnew/3.6.html#pythonmalloc-environment-varia... Please test PYTHONMALLOC=debug and PYTHONMALLOC=malloc with your favorite application. I also adjusted code (like code handling PYTHONMALLOCSTATS env var) to be able to use debug checks in all cases. For example, debug hooks are now also installed by default when Python is configured in debug mode without pymalloc support. Victor

M.-A. Lemburg schrieb am 04.02.2016 um 13:54:
On 04.02.2016 13:29, Victor Stinner wrote:
But, why not PyObject_Malloc() & PObject_Free() were not used in the first place?
Good question. I guess developers simply thought of PyObject_Malloc() being for PyObjects, not arbitrary memory buffers, most likely because pymalloc was advertised as allocator for Python objects, not random chunks of memory.
Note that the PyObject_Malloc() functions have never been documented. (Well, there are references regarding their mere existence in the docs, but nothing more than that.) https://docs.python.org/3.6/search.html?q=pyobject_malloc&check_keywords=yes&area=default And, for example, the "what's new in 2.5" document says: """ Python’s API has many different functions for allocating memory that are grouped into families. For example, PyMem_Malloc(), PyMem_Realloc(), and PyMem_Free() are one family that allocates raw memory, while PyObject_Malloc(), PyObject_Realloc(), and PyObject_Free() are another family that’s supposed to be used for creating Python objects. """ I don't think there are many extensions out there in which *object* memory gets allocated manually, which implicitly puts a pretty clear "don't use" marker on these functions. Stefan

2016-02-07 9:22 GMT+01:00 Stefan Behnel <stefan_ml@behnel.de>:
Note that the PyObject_Malloc() functions have never been documented.
Yeah, there is an old bug to track this: http://bugs.python.org/issue20064
And, for example, the "what's new in 2.5" document says:
""" Python’s API has many different functions for allocating memory that are grouped into families. For example, PyMem_Malloc(), PyMem_Realloc(), and PyMem_Free() are one family that allocates raw memory, while PyObject_Malloc(), PyObject_Realloc(), and PyObject_Free() are another family that’s supposed to be used for creating Python objects. """
I don't think there are many extensions out there in which *object* memory gets allocated manually, which implicitly puts a pretty clear "don't use" marker on these functions.
Should I understand that it's another good reason to make PyMem_Malloc() faster for everyone? Victor
participants (4)
-
Brett Cannon
-
M.-A. Lemburg
-
Stefan Behnel
-
Victor Stinner