[Numpy-discussion] numpy arrays, data allocation and SIMD alignement

David Cournapeau david at ar.media.kyoto-u.ac.jp
Fri Aug 3 02:06:58 EDT 2007


   Following an ongoing discussion with S. Johnson, one of the developer 
of fftw3, I would be interested in what people think about adding 
infrastructure in numpy related to SIMD alignement (that is 16 bytes 
alignement for SSE/ALTIVEC, I don't know anything about other archs). 
The problem is that right now, it is difficult to get information for 
alignement in numpy (by alignement here, I mean something different than 
what is normally meant in numpy context; whether, in my understanding, 
NPY_ALIGNED refers to a pointer which is aligned wrt his type, here, I 
am talking about arbitrary alignement).
  For example, for fftw3, we need to know whether a given data buffer is 
16 bytes aligned to get optimal performances; generally, SSE needs 16 
byte alignement for optimal performances, as well as altivec. I think it 
would be nice to get some infrastructure to help developers to get those 
kind of information, and maybe to be able to request 16 aligned buffers.
   Here is what I can think of:
      - adding an API to know whether a given PyArrayObject has its data 
buffer 16 bytes aligned, and requesting a 16 bytes aligned 
PyArrayObject. Something like NPY_ALIGNED, basically.
      - forcing data allocation to be 16 bytes aligned in numpy (eg 
define PyDataMem_Mem to a 16 bytes aligned allocator instead of malloc). 
This would mean that many arrays would be "naturally" 16 bytes aligned 
without effort.

Point 2 is really easy to implement I think: actually, on some platforms 
(Mac OS X and FreeBSD), malloc returning 16 bytes aligned buffers 
anyway, so I don't think the wasted space is a real problem. Linux with 
glibc is 8 bytes aligned, I don't know about windows. Implementing our 
own 16 bytes aligned memory allocator for cross platform compatibility 
should be relatively easy. I don't see any drawback, but I guess other 
people will.

Point 1 is more tricky, as this requires much more changes in the code.

Do main developers of numpy have an opinion on this ?



More information about the NumPy-Discussion mailing list