Another argument for supporting stateful allocators would be compatibility with the stateful C++11 allocator API, such as

Adding support for stateful allocators at a later date would almost certainly create an ABI breakage or lots of pain around avoiding one.

I haven't thought very much about the PyCapsule approach (although it appears some other reviewers on github considered it at one point), but even building it from scratch, the overhead to support statefulness is not large.
As I demonstrate on the github issue (18805), would amount to changing the API from:
// the version in the NEP
typedef void *(PyDataMem_AllocFunc)(size_t size);
typedef void *(PyDataMem_ZeroedAllocFunc)(size_t nelems, size_t elsize);
typedef void (PyDataMem_FreeFunc)(void *ptr, size_t size);
typedef void *(PyDataMem_ReallocFunc)(void *ptr, size_t size);
typedef struct {
    char name[200];
    PyDataMem_AllocFunc *alloc;
    PyDataMem_ZeroedAllocFunc *zeroed_alloc;
    PyDataMem_FreeFunc *free;
    PyDataMem_ReallocFunc *realloc;
} PyDataMem_HandlerObject;
const PyDataMem_Handler * PyDataMem_SetHandler(PyDataMem_Handler *handler);
const char * PyDataMem_GetHandlerName(PyArrayObject *obj);
// proposed changes: a `PyObject *self` argument pointing to a `PyDataMem_HandlerObject` and a ` PyObject_HEAD`
typedef void *(PyDataMem_AllocFunc)(PyObject *self, size_t size);
typedef void *(PyDataMem_ZeroedAllocFunc)(PyObject *self, size_t nelems, size_t elsize);
typedef void (PyDataMem_FreeFunc)(PyObject *self, void *ptr, size_t size);
typedef void *(PyDataMem_ReallocFunc)(PyObject *self, void *ptr, size_t size);
typedef struct {
    PyDataMem_AllocFunc *alloc;
    PyDataMem_ZeroedAllocFunc *zeroed_alloc;
    PyDataMem_FreeFunc *free;
    PyDataMem_ReallocFunc *realloc;
} PyDataMem_HandlerObject;
// steals a reference to handler, caller is responsible for decrefing the result
PyDataMem_Handler * PyDataMem_SetHandler(PyDataMem_Handler *handler);
// borrowed reference
PyDataMem_Handler * PyDataMem_GetHandler(PyArrayObject *obj);

// some boilerplate that numpy is already full of and doesn't impact users of non-stateful allocators
PyTypeObject PyDataMem_HandlerType = ...;
When constructing an array, the reference count of the handler would be incremented before storing it in the array struct

Since the extra work now to support this is not awful, but the potential for ABI headaches down the road is, I think we should aim to support statefulness right from the start.
The runtime overhead of the stateful approach above vs the NEP approach is negligible, and consists of:
* Some overhead costs for setting up an allocator. This likely only happens near startup, so won't matter.
* An extra incref on each array allocation
* An extra pointer argument on the stack for each allocation and deallocation
* Perhaps around 32 extra bytes per allocator objects. Since arrays just store pointers to allocators this doesn't matter.


On Thu, 6 May 2021 at 12:43, Matti Picus <> wrote:

On 6/5/21 2:07 pm, Eric Wieser wrote:
> The NEP looks good, but I worry the API isn't flexible enough. My two
> main concerns are:
> ### Stateful allocators
> Consider an allocator that aligns to `N` bytes, where `N` is
> configurable from a python call in someone else's extension module.
> ...
> ### Thread and async-local allocators
> For tracing purposes, I expect it to be valuable to be able to
> configure the allocator within a single thread / coroutine.
> If we want to support this, we'd most likely want to work with the
> PEP567 ContextVar API rather than a half-baked thread_local solution
> that doesn't work for async code.
> This problem isn't as pressing as the statefulness problem.
> Fixing it would amount to extending the `PyDataMem_SetHandler` API,
> and would be unlikely to break any code written against the current
> version of the NEP; meaning it would be fine to leave as a follow-up.
> It might still be worth remarking upon as future work of some kind in
> the NEP.
I would prefer to leave both of these to a future extension for the NEP.
Setting the alignment from a python-level call seems to be asking for
trouble, and I would need to be convinced that the extra layer of
flexibility is worth it.

It might be worth mentioning that this NEP may be extended in the
future, but truthfully I think that is the case for all NEPs.


NumPy-Discussion mailing list