[Numpy-discussion] NEP 49: Data allocation strategies

Wed Apr 21 03:10:22 EDT 2021

On Tue, Apr 20, 2021 at 2:18 PM Matti Picus <matti.picus at gmail.com> wrote:

> I have submitted NEP 49 to enable user-defined allocation strategies for
> the ndarray.data homogeneous memory area. The implementation is in PR
> 17582 https://github.com/numpy/numpy/pull/17582 Here is the text of the
> NEP:
>

Thanks Matti!

>
> Abstract
> --------
>
> The ``numpy.ndarray`` requires additional memory allocations
> to hold ``numpy.ndarray.strides``, ``numpy.ndarray.shape`` and
> ``numpy.ndarray.data`` attributes. These attributes are specially allocated
> after creating the python object in ``__new__`` method. The ``strides`` and
> ``shape`` are stored in a piece of memory allocated internally.
>
> This NEP proposes a mechanism to override the memory management strategy
> used
> for ``ndarray->data`` with user-provided alternatives. This allocation
> holds
> the arrays data and is can be very large. As accessing this data often
> becomes
> a performance bottleneck, custom allocation strategies to guarantee data
> alignment or pinning allocations to specialized memory hardware can enable
> hardware-specific optimizations.
>
> Motivation and Scope
> --------------------
>
> Users may wish to override the internal data memory routines with ones
> of their
> own. Two such use-cases are to ensure data alignment and to pin certain
> allocations to certain NUMA cores.
>

It would be great to expand a bit on these two sentences, and add some
links. There's a lot of history here in NumPy development to refer to as
well:

https://numpy-discussion.scipy.narkive.com/MvmMkJcK/numpy-arrays-data-allocation-and-simd-alignement
http://numpy-discussion.10968.n7.nabble.com/Aligned-configurable-memory-allocation-td39712.html
http://numpy-discussion.10968.n7.nabble.com/Numpy-s-policy-for-releasing-memory-td1533.html
https://github.com/numpy/numpy/issues/5312
https://github.com/numpy/numpy/issues/14177

There must also be a good amount of ideas/discussion elsewhere.

https://bugs.python.org/issue18835 discussed an aligned allocator for
Python itself, with fairly detailed discussion about whether/how NumPy
could benefit. With (I think) the conclusion it shouldn't be in Python, but
NumPy/Arrow/others are better off doing their own thing.

I'm wondering if improved memory profiling is a use case as well? Fil (
https://github.com/pythonspeed/filprofiler) for example seems to use such a
strategy:
https://github.com/pythonspeed/filprofiler/blob/master/design/allocator-overrides.md

Does it interact with our tracemalloc support (
https://numpy.org/doc/stable/release/1.13.0-notes.html#support-for-tracemalloc-in-python-3-6
)?

> User who wish to change the NumPy data memory management routines will use
>

This is design, not motivation or scope. Try to not refer to specific
function names in this section. I suggest moving this content to the
"Detailed design" section (or better, a "high level design" at the start of
that section).

Cheers,
Ralf

:c:func:`PyDataMem_SetHandler`, which uses a :c:type:`PyDataMem_Handler`
> structure to hold pointers to functions used to manage the data memory. The
> calls are wrapped by internal routines to call
> :c:func:`PyTraceMalloc_Track`,
> :c:func:`PyTraceMalloc_Untrack`, and will use the
> :c:func:`PyDataMem_EventHookFunc` mechanism  already present in NumPy for
> auditing purposes.
>
> Since a call to ``PyDataMem_SetHandler`` will change the default
> functions, but
> that function may be called during the lifetime of an ``ndarray``
> object, each
> ``ndarray`` will carry with it the ``PyDataMem_Handler`` struct used at the
> time of its instantiation, and these will be used to reallocate or free the
> data memory of the instance. Internally NumPy may use ``memcpy` or
> ``memset``
> on the data ``ptr``.
>
> Usage and Impact
> ----------------
>
> The new functions can only be accessed via the NumPy C-API. An example is
> included later in the NEP. The added ``struct`` will increase the size
> of the
> ``ndarray`` object. It is one of the major drawbacks of this approach.
> We can
> be reasonably sure that the change in size will have a minimal impact on
> end-user code because NumPy version 1.20 already changed the object size.
>
> Backward compatibility
> ----------------------
>
> The design will not break backward compatibility. Projects that were
> assigning
> to the ``ndarray->data`` pointer were already breaking the current memory
> management strategy (backed by ``npy_alloc_cache``) and should restore
> ``ndarray->data`` before calling ``Py_DECREF``. As mentioned above, the
> change
> in size should not impact end-users.
>
> Matti
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210421/6c0e3d3f/attachment-0001.html>