
Note that PEP-445 which introduced `PyMemAllocatorEx` specifically rejected omitting the `ctx` argument here: https://www.python.org/dev/peps/pep-0445/#id23, which is another argument in favor of having it.
I'll try to give a more thorough justification for the pyobject / capsule suggestion in another message in the next few days.
On Thu, 13 May 2021 at 17:06, eliaskoromilas elias.koromilas@gmail.com wrote:
Eric Wieser wrote
Yes, sorry, had been a while since I had looked it up:
https://docs.python.org/3/c-api/memory.html#c.PyMemAllocatorEx
That `PyMemAllocatorEx` looks almost exactly like one of the two variants I was proposing. Is there a reason for wanting to define our own structure vs just using that one? I think the NEP should at least offer a brief comparison to that structure, even if we ultimately end up not using it.
I have to say it feels a bit like exposing things publicly, that are really mainly used internally, but not sure... Presumably Python uses the `ctx` for something though.
I'd argue `ctx` / `baton` / `user_data` arguments are an essential part
of
any C callback API. I can't find any particularly good reference for this right now, but I have been bitten multiple times by C APIs that forget to add this argument.
If someone wants a different strategy (i.e. different alignment) they
create a new policy
The crux of the problem here is that without very nasty hacks, C and C++ do not allow new functions to be created at runtime. This makes it very awkward to write a parameterizable allocator. If you want to create two aligned allocators with different alignments, and you don't have a `ctx` argument to plumb through that alignment information, you're forced to write the entire thing twice.
The `PyMemAllocatorEx` memory API will allow (lambda) closure-like definition of the data mem routines. That's the main idea behind the `ctx` thing, it's huge and will enable every allocation scenario.
In my opinion, the rest of the proposals (PyObjects, PyCapsules, etc.) are secondary and could be considered out-of-scope. I would suggest to let people use this before hiding it behind a strict API.
Let me also give you an insight of how we plan to do it, since we are the first to integrate this in production code. Considering this NEP as a primitive API, I developed a new project to address our requirements:
- Provide a Python-native way to define a new numpy allocator
- Accept data mem routine symbols (function pointers) from open dynamic
libraries 3. Allow local-scoped allocation, e.g. inside a `with` statement
But since there was not much fun in these, I thought it would be nice if we could exploit `ctypes` callback functions, to allow developers hook into such routines natively (e.g. for debugging/monitoring), or even write them entirely in Python (of course there has to be an underlying memory allocation API).
For example, the idea is to be able to define a page-aligned allocator in ~30 lines of Python code, like that:
https://github.com/inaccel/numpy-allocator/blob/master/test/aligned_allocato...
While experimenting with this project I spotted the two following issues:
- Thread-locality
My biggest concern is the global scope of the numpy `current_allocator` variable. Currently, an allocator change is applied globally affecting every thread. This behavior breaks the local-scoped allocation promise of my project. Imagine for example the implications of allocating pinned (page-locked) memory (since you mention this use-case a lot) for random glue-code ndarrays in background threads.
- Allocator context (already discussed)
I found a bug, when I tried to use a Python callback (`ctypes.CFUNCTION`) for the `PyDataMem_FreeFunc` routine. Since there are cases in which the `free` routine is invoked after a PyErr has occurred (to clean up internal arrays for example), `ctypes` messes with the exception state badly. This problem can be resolved with the the use of a `ctx` (allocator context) that will allow the routines to run clean of errors, wrapping them like that:
static void wrapped_free(void *ptr, size_t size, void *ctx) { PyObject *type; PyObject *value; PyObject *traceback; PyErr_Fetch(&type, &value, &traceback); ((PyDataMem_Context *) ctx)->free(ptr, size); PyErr_Restore(type, value, traceback); }
Note: This bug doesn't affect `CDLL` members (CFuncPtr objects), since they are pure `dlsym` pointers.
Of course, this is a simple case of how a `ctx` could be useful for an allocation policy. I guess people can become very creative with this in general.
Elias
-- Sent from: http://numpy-discussion.10968.n7.nabble.com/ _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion