[Numpy-discussion] numpy allocation event hooks

Thouis (Ray) Jones thouis at gmail.com
Mon Jun 18 09:58:19 EDT 2012

On Mon, Jun 18, 2012 at 3:46 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 06/18/2012 12:14 PM, Thouis (Ray) Jones wrote:
>> Based on some previous discussion on the numpy list [1] and in
>> now-cancelled PRs [2,3], I'd like to solicit opinions on adding an
>> interface for numpy memory allocation event tracking, as implemented
>> in this PR:
>> https://github.com/numpy/numpy/pull/309
>> A brief summary of the changes:
>> - PyDataMem_NEW/FREE/RENEW become functions in the numpy API.
>>    (they used to be macros for malloc/free/realloc)
>>    These are the functions used to manage allocations for array's
>>    internal data.  Most other numpy data is allocated through Python's
>>    allocator.
>> - PyDataMem_NEW/RENEW return void* instead of char*.
>> - Adds PyDataMem_SetEventHook() to the API, with this description:
>>   * Sets the allocation event hook for numpy array data.
>>   * Takes a PyDataMem_EventHookFunc *, which has the signature:
>>   *        void hook(void *old, void *new, size_t size, void *user_data).
>>   *   Also takes a void *user_data, and void **old_data.
>>   *
>>   * Returns a pointer to the previous hook or NULL.  If old_data is
>>   * non-NULL, the previous user_data pointer will be copied to it.
>>   *
>>   * If not NULL, hook will be called at the end of each PyDataMem_NEW/FREE/RENEW:
>>   *   result = PyDataMem_NEW(size)        ->  (*hook)(NULL, result,
>> size, user_data)
>>   *   PyDataMem_FREE(ptr)                 ->  (*hook)(ptr, NULL, 0, user_data)
>>   *   result = PyDataMem_RENEW(ptr, size) ->  (*hook)(ptr, result, size,
>> user_data)
>>   *
>>   * When the hook is called, the GIL will be held by the calling
>>   * thread.  The hook should be written to be reentrant, if it performs
>>   * operations that might cause new allocation events (such as the
>>   * creation/descruction numpy objects, or creating/destroying Python
>>   * objects which might cause a gc)
>> The PR also includes an example using the hook functions to track
>> allocation via Python callback funcions (in
>> tools/allocation_tracking).
>> Why I think this is worth adding to numpy, even though other tools may
>> be able to provide similar functionality:
>> - numpy arrays use orders of magnitude more memory than most python
>>    objects, and this is often a limiting factor in algorithms.
>> - numpy can behave in complicated ways with regards to memory
>>    management, e.g., views, OWNDATA, temporaries, etc., making it
>>    sometimes difficult to know where memory usage problems are
>>    happening and why.
>> - numpy attracts a large number of programmers with limited low-level
>>    programming expertise, and who don't have the skills to use external
>>    tools (or time/motivation to acquire those skills), but still need
>>    to be able to diagnose these sorts of problems.
>> - Other tools are not well integrated with Python, and vary a great
>>    deal between OS and compiler setup.
>> I appreciate any feedback.
> Are the hooks able to change how allocation happens/override allocation?
> If one goes to this much pain already, I think one might as well go the
> extra step and allow hooks to override memory allocation.
> At least something to think about -- of course the above (as I
> understand it) would be a good start on a pluggable allocator even if it
> isn't done right away.
> Examples:
>  - Allocate NumPy arrays in process-shared memory using shmem/mmap
>  - Allocate NumPy arrays on some boundary (16-byte, 4096-byte..) using
> memalign

That's not present in the current change, but the choice to use
"EventHook" rather than the more generic "Hook" was to avoid colliding
with a change like that in the future.

More information about the NumPy-Discussion mailing list