[Numpy-discussion] NEP 49: Data allocation strategies
matti.picus at gmail.com
Tue Apr 20 08:17:59 EDT 2021
I have submitted NEP 49 to enable user-defined allocation strategies for
the ndarray.data homogeneous memory area. The implementation is in PR
17582 https://github.com/numpy/numpy/pull/17582 Here is the text of the NEP:
The ``numpy.ndarray`` requires additional memory allocations
to hold ``numpy.ndarray.strides``, ``numpy.ndarray.shape`` and
``numpy.ndarray.data`` attributes. These attributes are specially allocated
after creating the python object in ``__new__`` method. The ``strides`` and
``shape`` are stored in a piece of memory allocated internally.
This NEP proposes a mechanism to override the memory management strategy
for ``ndarray->data`` with user-provided alternatives. This allocation holds
the arrays data and is can be very large. As accessing this data often
a performance bottleneck, custom allocation strategies to guarantee data
alignment or pinning allocations to specialized memory hardware can enable
Motivation and Scope
Users may wish to override the internal data memory routines with ones
own. Two such use-cases are to ensure data alignment and to pin certain
allocations to certain NUMA cores.
User who wish to change the NumPy data memory management routines will use
:c:func:`PyDataMem_SetHandler`, which uses a :c:type:`PyDataMem_Handler`
structure to hold pointers to functions used to manage the data memory. The
calls are wrapped by internal routines to call
:c:func:`PyTraceMalloc_Untrack`, and will use the
:c:func:`PyDataMem_EventHookFunc` mechanism already present in NumPy for
Since a call to ``PyDataMem_SetHandler`` will change the default
that function may be called during the lifetime of an ``ndarray``
``ndarray`` will carry with it the ``PyDataMem_Handler`` struct used at the
time of its instantiation, and these will be used to reallocate or free the
data memory of the instance. Internally NumPy may use ``memcpy` or
on the data ``ptr``.
Usage and Impact
The new functions can only be accessed via the NumPy C-API. An example is
included later in the NEP. The added ``struct`` will increase the size
``ndarray`` object. It is one of the major drawbacks of this approach.
be reasonably sure that the change in size will have a minimal impact on
end-user code because NumPy version 1.20 already changed the object size.
The design will not break backward compatibility. Projects that were
to the ``ndarray->data`` pointer were already breaking the current memory
management strategy (backed by ``npy_alloc_cache``) and should restore
``ndarray->data`` before calling ``Py_DECREF``. As mentioned above, the
in size should not impact end-users.
More information about the NumPy-Discussion