[Numpy-discussion] NEP 49: Data allocation strategies
ralf.gommers at gmail.com
Wed Apr 21 03:10:22 EDT 2021
On Tue, Apr 20, 2021 at 2:18 PM Matti Picus <matti.picus at gmail.com> wrote:
> I have submitted NEP 49 to enable user-defined allocation strategies for
> the ndarray.data homogeneous memory area. The implementation is in PR
> 17582 https://github.com/numpy/numpy/pull/17582 Here is the text of the
> The ``numpy.ndarray`` requires additional memory allocations
> to hold ``numpy.ndarray.strides``, ``numpy.ndarray.shape`` and
> ``numpy.ndarray.data`` attributes. These attributes are specially allocated
> after creating the python object in ``__new__`` method. The ``strides`` and
> ``shape`` are stored in a piece of memory allocated internally.
> This NEP proposes a mechanism to override the memory management strategy
> for ``ndarray->data`` with user-provided alternatives. This allocation
> the arrays data and is can be very large. As accessing this data often
> a performance bottleneck, custom allocation strategies to guarantee data
> alignment or pinning allocations to specialized memory hardware can enable
> hardware-specific optimizations.
> Motivation and Scope
> Users may wish to override the internal data memory routines with ones
> of their
> own. Two such use-cases are to ensure data alignment and to pin certain
> allocations to certain NUMA cores.
It would be great to expand a bit on these two sentences, and add some
links. There's a lot of history here in NumPy development to refer to as
There must also be a good amount of ideas/discussion elsewhere.
https://bugs.python.org/issue18835 discussed an aligned allocator for
Python itself, with fairly detailed discussion about whether/how NumPy
could benefit. With (I think) the conclusion it shouldn't be in Python, but
NumPy/Arrow/others are better off doing their own thing.
I'm wondering if improved memory profiling is a use case as well? Fil (
https://github.com/pythonspeed/filprofiler) for example seems to use such a
Does it interact with our tracemalloc support (
> User who wish to change the NumPy data memory management routines will use
This is design, not motivation or scope. Try to not refer to specific
function names in this section. I suggest moving this content to the
"Detailed design" section (or better, a "high level design" at the start of
:c:func:`PyDataMem_SetHandler`, which uses a :c:type:`PyDataMem_Handler`
> structure to hold pointers to functions used to manage the data memory. The
> calls are wrapped by internal routines to call
> :c:func:`PyTraceMalloc_Untrack`, and will use the
> :c:func:`PyDataMem_EventHookFunc` mechanism already present in NumPy for
> auditing purposes.
> Since a call to ``PyDataMem_SetHandler`` will change the default
> functions, but
> that function may be called during the lifetime of an ``ndarray``
> object, each
> ``ndarray`` will carry with it the ``PyDataMem_Handler`` struct used at the
> time of its instantiation, and these will be used to reallocate or free the
> data memory of the instance. Internally NumPy may use ``memcpy` or
> on the data ``ptr``.
> Usage and Impact
> The new functions can only be accessed via the NumPy C-API. An example is
> included later in the NEP. The added ``struct`` will increase the size
> of the
> ``ndarray`` object. It is one of the major drawbacks of this approach.
> We can
> be reasonably sure that the change in size will have a minimal impact on
> end-user code because NumPy version 1.20 already changed the object size.
> Backward compatibility
> The design will not break backward compatibility. Projects that were
> to the ``ndarray->data`` pointer were already breaking the current memory
> management strategy (backed by ``npy_alloc_cache``) and should restore
> ``ndarray->data`` before calling ``Py_DECREF``. As mentioned above, the
> in size should not impact end-users.
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion