[Numpy-discussion] numpy arrays, data allocation and SIMD alignement
strawman at astraw.com
Fri Aug 3 11:12:44 EDT 2007
Both ideas, particularly the 2nd, would be excellent additions to numpy.
I often use the Intel IPP (Integrated Performance Primitives) Library
together with numpy, but I have to do all my memory allocation with the
IPP to ensure fastest operation. I then create numpy views of the data.
All this works brilliantly, but it would be really nice if I could
allocate the memory directly in numpy.
IPP allocates, and says it wants, 32 byte aligned memory (see, e.g.
http://www.intel.com/support/performancetools/sb/CS-021418.htm ). Given
that fftw3 apparently wants 16 byte aligned memory, my feeling is that,
if the effort is made, the alignment width should be specified at
run-time, rather than hard-coded.
In terms of implementation of your 1st point, I'm not aware of how much
effort your idea would take (and it does sound nice), but some benefit
would be had just from a simple function numpy.is_mem_aligned( ndarray,
width=16 ) which returns a bool.
David Cournapeau wrote:
> Following an ongoing discussion with S. Johnson, one of the developer
> of fftw3, I would be interested in what people think about adding
> infrastructure in numpy related to SIMD alignement (that is 16 bytes
> alignement for SSE/ALTIVEC, I don't know anything about other archs).
> The problem is that right now, it is difficult to get information for
> alignement in numpy (by alignement here, I mean something different than
> what is normally meant in numpy context; whether, in my understanding,
> NPY_ALIGNED refers to a pointer which is aligned wrt his type, here, I
> am talking about arbitrary alignement).
> For example, for fftw3, we need to know whether a given data buffer is
> 16 bytes aligned to get optimal performances; generally, SSE needs 16
> byte alignement for optimal performances, as well as altivec. I think it
> would be nice to get some infrastructure to help developers to get those
> kind of information, and maybe to be able to request 16 aligned buffers.
> Here is what I can think of:
> - adding an API to know whether a given PyArrayObject has its data
> buffer 16 bytes aligned, and requesting a 16 bytes aligned
> PyArrayObject. Something like NPY_ALIGNED, basically.
> - forcing data allocation to be 16 bytes aligned in numpy (eg
> define PyDataMem_Mem to a 16 bytes aligned allocator instead of malloc).
> This would mean that many arrays would be "naturally" 16 bytes aligned
> without effort.
> Point 2 is really easy to implement I think: actually, on some platforms
> (Mac OS X and FreeBSD), malloc returning 16 bytes aligned buffers
> anyway, so I don't think the wasted space is a real problem. Linux with
> glibc is 8 bytes aligned, I don't know about windows. Implementing our
> own 16 bytes aligned memory allocator for cross platform compatibility
> should be relatively easy. I don't see any drawback, but I guess other
> people will.
> Point 1 is more tricky, as this requires much more changes in the code.
> Do main developers of numpy have an opinion on this ?
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
More information about the NumPy-Discussion