Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

4 Aug 2007

...
Here's a hack that google turned up:
(1) Use static variables instead of dynamic (stack) variables
    (2) Use in-line assembly code that explicitly aligns data
    (3) In C code, use "*malloc*" to explicitly allocate variables
Here is Intel's example of (2):
; procedure prologue
    push ebp
    mov esp, ebp
    and ebp, -8
    sub esp, 12
; procedure epilogue
    add esp, 12
    pop ebp
    ret
Intel's example of (3), slightly modified:
double *p, *newp;
    p = (double*)*malloc* ((sizeof(double)*NPTS)+4);
    newp = (p+4) & (~7);
This assures that newp is 8-*byte* aligned even if p is not. However,
    *malloc*() may already follow Intel's recommendation that a *32*-*
    byte* or
    greater data structures be aligned on a *32* *byte* boundary. In
    that case,
    increasing the requested memory by 4 bytes and computing newp are
    superfluous.
I think that for numpy arrays it should be possible to define the 
offset so that the result is 32 byte aligned. However, this might 
break some peoples' code if they haven't payed attention to the offset.
Why ? I really don't see how it can break anything at the source code 
level. You don't have to care about things you didn't care before: the 
best proof of that if that numpy runs on different platforms where the 
malloc has different alignment guarantees (mac OS X already aligned to 
16 bytes, for the very reason of making optimizing with SIMD easier, 
whereas glibc malloc only aligns to 8 bytes, at least on Linux).
...
Another possibility is to allocate an oversized array, check the 
pointer, and take a range out of it. For instance:
In [32]: a = zeros(10)
In [33]: a.ctypes.data % 32
Out[33]: 16
The array alignment is 16 bytes, consequently
In [34]: a[2:].ctypes.data % 32
Out[34]: 0
Voila, 32 byte alignment. I think a short python routine could do 
this, which ought to serve well for 1D fft's. Multidimensional arrays 
will be trickier if you want the rows to be aligned. Aligning the 
columns just isn't going to work.
I am not suggesting realigning existing arrays. What I would like numpy 
to support are the following cases:
- Check whether a given a numpy array is simd aligned:

/* Simple case: if aligned, use optimized func, use non optimized 
otherwise */
int simd_func(double* in, size_t n);
int nosimd_func(double* in, size_t n);

if (PyArray_ISALIGNED_SIMD(a)) {
    simd_func((double *)a->data, a->size);
} else {
    nosimd_func((double *)a->data, a->size);
}
 - Request explicitely an aligned arrays from any PyArray_* functions 
which create a ndarray, eg: ar = PyArray_FROM_OF(a, NPY_SIMD_ALIGNED);

Allocating a buffer aligned to a given alignment is not the problem: 
there is a posix functions to do it, and we can implement easily a 
function for the OS who do not support it. This would be done in C, not 
in python.

cheers,

David

Re: [Numpy-discussion] numpy arrays, data allocation and SIMD alignement

David Cournapeau