Travis E. Oliphant wrote:
I've finally caught up with the discussion on aligned allocators for NumPy. In general I'm favorable to the idea, although it is not as easy to implement in 1.0.X because of the need to possibly change the C-API.
The Python solution is workable and would just require a function call on the Python side (which one could call from the C-side as well with little difficulty, I believe Chuck Harris already suggested such a function). So, I think at least the Python functions are an easy addition for 1.0.4 (along with simple tests for alignment --- although a.ctypes.data % 16 is pretty simple and probably doesn't warrant a new function)
I'm a bit more resistant to the more involved C-code in the patch provided with #568, because of the requested new additions to the C-API, but I understand the need. I'm currently also thinking heavily about using SIMD intrinsics in ufunc inner loops but will not likely get those in before 1.0.4. Unfortunately, all ufuncs that take advantage of SIMD instructions will have to handle the unaligned portions which may occur even if the start of the array is aligned, so the problem of thinking about alignment does not go away there with a simplified function call.
I don't know anything about the ufunc machinery yet, but I guess you need to know the alignement of a given buffer, right ? This can be done easily. Actually, one problem I encountered (If I remember correctly) was that there is no pure C library facility in numpy: by that, I mean a simple C library, independant of python, which could be reusable by C code using numpy. For example, if we want to start thinking about using SIMD, I think it would be good to support basics in a pure C library. I don't see any downside to this approach ?
A simple addition is an NPY_ALIGNED_16 and NPY_ALIGNED_32 flag for the PyArray_From_Any that could adjust the data-pointer as needed to get at least those kinds of alignment.
We can't change the C-API for PyArray_FromAny to accept an alignment flag, and I'm pretty loath to do that even for 1.1.
Is there a consensus? What do others think of the patch in ticket #568? Is there a need to add general-purpose aligned memory allocators to NumPy without a corresponding array_allocator?
Having the NPY_ALIGNED_* flags would already be enough for most cases (for SIMD, you rarely, if ever needs more than 32 bytes alignment AFAIK). Those flags + general purposes memory allocators (in a C support library) would be enough to do everything we need to fft, for example. cheers, David