[Numpy-discussion] Byte aligned arrays

Henry Gomersall heng at cantab.net
Wed Dec 19 11:47:25 EST 2012

On Wed, 2012-12-19 at 15:57 +0000, Nathaniel Smith wrote:
> Not sure which interface is more useful to users. On the one hand,
> using funny dtypes makes regular non-SIMD access more cumbersome, and
> it forces your array size to be a multiple of the SIMD word size,
> which might be inconvenient if your code is smart enough to handle
> arbitrary-sized arrays with partial SIMD acceleration (i.e., using
> SIMD for most of the array, and then a slow path to handle any partial
> word at the end). OTOH, if your code *is* that smart, you should
> probably just make it smart enough to handle a partial word at the
> beginning as well and then you won't need any special alignment in the
> first place, and representing each SIMD word as a single numpy scalar
> is an intuitively appealing model of how SIMD works. OTOOH, just
> adding a single argument np.array() is a much simpler to explain than
> some elaborate scheme involving the creation of special custom dtypes.

If it helps, my use-case is in wrapping the FFTW library. This _is_
smart enough to deal with unaligned arrays, but it just results in a
performance penalty. In the case of an FFT, there are clearly going to
be issues with the powers of two indices in the array not lying on a
suitable n-byte boundary (which would be the case with a misaligned
array), but I imagine it's not unique.

The other point is that it's easy to create a suitable power of two
array that should always bypass any special case unaligned code (e.g.
with floats, any multiple of 4 array length will fill every 16-byte

Finally, I think there is significant value in auto-aligning the array
based on an appropriate inspection of the cpu capabilities (or
alternatively, a function that reports back the appropriate SIMD
alignment). Again, this makes it easier to wrap libraries that may
function with any alignment, but benefit from optimum alignment.



