Mailman 3 NEP 49: Data allocation strategies - NumPy-Discussion

April 20, 2021

      I have submitted NEP 49 to enable user-defined allocation strategies for 
the ndarray.data homogeneous memory area. The implementation is in PR 
17582 https://github.com/numpy/numpy/pull/17582 Here is the text of the NEP:

Abstract
--------

The ``numpy.ndarray`` requires additional memory allocations
to hold ``numpy.ndarray.strides``, ``numpy.ndarray.shape`` and
``numpy.ndarray.data`` attributes. These attributes are specially allocated
after creating the python object in ``__new__`` method. The ``strides`` and
``shape`` are stored in a piece of memory allocated internally.

This NEP proposes a mechanism to override the memory management strategy 
used
for ``ndarray->data`` with user-provided alternatives. This allocation holds
the arrays data and is can be very large. As accessing this data often 
becomes
a performance bottleneck, custom allocation strategies to guarantee data
alignment or pinning allocations to specialized memory hardware can enable
hardware-specific optimizations.

Motivation and Scope
--------------------

Users may wish to override the internal data memory routines with ones 
of their
own. Two such use-cases are to ensure data alignment and to pin certain
allocations to certain NUMA cores.

User who wish to change the NumPy data memory management routines will use
:c:func:`PyDataMem_SetHandler`, which uses a :c:type:`PyDataMem_Handler`
structure to hold pointers to functions used to manage the data memory. The
calls are wrapped by internal routines to call 
:c:func:`PyTraceMalloc_Track`,
:c:func:`PyTraceMalloc_Untrack`, and will use the
:c:func:`PyDataMem_EventHookFunc` mechanism  already present in NumPy for
auditing purposes.

Since a call to ``PyDataMem_SetHandler`` will change the default 
functions, but
that function may be called during the lifetime of an ``ndarray`` 
object, each
``ndarray`` will carry with it the ``PyDataMem_Handler`` struct used at the
time of its instantiation, and these will be used to reallocate or free the
data memory of the instance. Internally NumPy may use ``memcpy` or 
``memset``
on the data ``ptr``.

Usage and Impact
----------------

The new functions can only be accessed via the NumPy C-API. An example is
included later in the NEP. The added ``struct`` will increase the size 
of the
``ndarray`` object. It is one of the major drawbacks of this approach. 
We can
be reasonably sure that the change in size will have a minimal impact on
end-user code because NumPy version 1.20 already changed the object size.

Backward compatibility
----------------------

The design will not break backward compatibility. Projects that were 
assigning
to the ``ndarray->data`` pointer were already breaking the current memory
management strategy (backed by ``npy_alloc_cache``) and should restore
``ndarray->data`` before calling ``Py_DECREF``. As mentioned above, the 
change
in size should not impact end-users.

Matti

NEP 49: Data allocation strategies

Matti Picus

Ralf Gommers

Matti Picus

tags

participants (2)