Another argument for supporting stateful allocators would be compatibility with the stateful C++11 allocator API, such as
https://en.cppreference.com/w/cpp/memory/allocator_traits/allocate.
Adding support for stateful allocators at a later date would almost certainly create an ABI breakage or lots of pain around avoiding one.
I haven't thought very much about the PyCapsule approach (although it appears some other reviewers on github considered it at one point), but even building it from scratch, the overhead to support statefulness is not large.
As I demonstrate on the github issue (18805), would amount to changing the API from:
```C
// the version in the NEP
typedef void *(PyDataMem_AllocFunc)(size_t size);
typedef void *(PyDataMem_ZeroedAllocFunc)(size_t nelems, size_t elsize);
typedef void (PyDataMem_FreeFunc)(void *ptr, size_t size);
typedef void *(PyDataMem_ReallocFunc)(void *ptr, size_t size);
typedef struct {
char name[200];
PyDataMem_AllocFunc *alloc;
PyDataMem_ZeroedAllocFunc *zeroed_alloc;
PyDataMem_FreeFunc *free;
PyDataMem_ReallocFunc *realloc;
} PyDataMem_HandlerObject;
const PyDataMem_Handler * PyDataMem_SetHandler(PyDataMem_Handler *handler);
const char * PyDataMem_GetHandlerName(PyArrayObject *obj);
```
to
```C
// proposed changes: a `PyObject *self` argument pointing to a `PyDataMem_HandlerObject` and a `
PyObject_HEAD`
typedef void *(PyDataMem_AllocFunc)(PyObject *self, size_t size);
typedef void *(PyDataMem_ZeroedAllocFunc)(PyObject *self, size_t nelems, size_t elsize);
typedef void (PyDataMem_FreeFunc)(PyObject *self, void *ptr, size_t size);
typedef void *(PyDataMem_ReallocFunc)(PyObject *self, void *ptr, size_t size);
typedef struct {
PyObject_HEAD
PyDataMem_AllocFunc *alloc;
PyDataMem_ZeroedAllocFunc *zeroed_alloc;
PyDataMem_FreeFunc *free;
PyDataMem_ReallocFunc *realloc;
} PyDataMem_HandlerObject;
// steals a reference to handler, caller is responsible for decrefing the result
PyDataMem_Handler * PyDataMem_SetHandler(PyDataMem_Handler *handler);
// borrowed reference
PyDataMem_Handler
* PyDataMem_GetHandler(PyArrayObject *obj);
// some boilerplate that numpy is already full of and doesn't impact users of non-stateful allocators
PyTypeObject PyDataMem_HandlerType = ...;
```
When constructing an array, the reference count of the handler would be incremented before storing it in the array struct
Since the extra work now to support this is not awful, but the potential for ABI headaches down the road is, I think we should aim to support statefulness right from the start.
The runtime overhead of the stateful approach above vs the NEP approach is negligible, and consists of:
* Some overhead costs for setting up an allocator. This likely only happens near startup, so won't matter.
* An extra incref on each array allocation
* An extra pointer argument on the stack for each allocation and deallocation
* Perhaps around 32 extra bytes per allocator objects. Since arrays just store pointers to allocators this doesn't matter.
Eric