On Tue, Jun 23, 2020 at 11:33 AM Victor Stinner <vstinner@python.org> wrote:
Hi Neil,

Le mar. 23 juin 2020 à 03:47, Neil Schemenauer

> One aspect of the API that could be improved is memory management
> for PyObjects.  The current API is quite a mess and for no good
> reason except legacy, IMHO.  The original API design allowed
> extension types to use their own memory allocator.  E.g. they could
> call their own malloc()/free() implemention and the rest of the
> CPython runtime would handle that.  One consequence is that
> Py_DECREF() cannot call PyObject_Free() but instead has to call
> tp_dealloc().  There was supposed to be multiple layers of
> allocators, PyMem vs PyObject, but since the layering was not
> enforced, we ended up with a bunch of aliases to the same underlying
> function.

I vaguely recall someone explaining that Python memory allocator
created high memory fragmentation, and using a dedicated memory
allocator was way more efficient. But I concur that the majority of
people never override default tp_new and tp_free functions.

Not so much Python's memory allocator (it does better than most), but just plain malloc. However, the answer in these cases isn't to replace the allocator for a few extension types, since that wouldn't affect any of Python's own allocations. The better answer is to replace malloc altogether. At Google we use tcmalloc for everything, by linking it into the binaries we build. However, the effect on Python's allocations isn't very big (but it's still measurable) because obmalloc does a pretty good job; we do it more for the C/C++ libraries we end up wrapping, where it can matter *a lot*. I don't think we ever set tp_new/tp_free to anything other than the defaults, and we could certainly live with it going away. We also experimented with disabling obmalloc when using tcmalloc, but obmalloc still does measurably better than tcmalloc.

There's another reason not to have different allocators, at least not ones that don't trickle down to 'malloc': AddressSanitizer and ThreadSanitizer rely on intercepting all allocations, and they are *very* useful tools for any C/C++ codebase. They don't (at the moment) particularly benefit Python code, but they certainly do benefit CPython extensions and the C/C++ libraries they wrap.

I think the ability for per-type allocation/deallocation routines isn't really about efficiency, but more about giving more control to embedding systems (or libraries wrapped by extension modules) about how *their* objects are allocated. It doesn't make much sense, however, because Python wouldn't allocate their objects anyway, just the Python objects wrapping theirs. Allocating CPython objects should be CPython's job.

FWIW, I suspect the biggest problem with getting rid of tp_new/tp_free is code that does *more* than just allocate in those functions, only because the authors didn't realise they should be doing it in tp_alloc/tp_dealloc instead.

--
Thomas Wouters <thomas@python.org>

Hi! I'm an email virus! Think twice before sending your email to help me spread!