On Wed, 2012-12-19 at 15:57 +0000, Nathaniel Smith wrote:
Not sure which interface is more useful to users. On the one hand, using funny dtypes makes regular non-SIMD access more cumbersome, and it forces your array size to be a multiple of the SIMD word size, which might be inconvenient if your code is smart enough to handle arbitrary-sized arrays with partial SIMD acceleration (i.e., using SIMD for most of the array, and then a slow path to handle any partial word at the end). OTOH, if your code *is* that smart, you should probably just make it smart enough to handle a partial word at the beginning as well and then you won't need any special alignment in the first place, and representing each SIMD word as a single numpy scalar is an intuitively appealing model of how SIMD works. OTOOH, just adding a single argument np.array() is a much simpler to explain than some elaborate scheme involving the creation of special custom dtypes.
If it helps, my use-case is in wrapping the FFTW library. This _is_ smart enough to deal with unaligned arrays, but it just results in a performance penalty. In the case of an FFT, there are clearly going to be issues with the powers of two indices in the array not lying on a suitable n-byte boundary (which would be the case with a misaligned array), but I imagine it's not unique. The other point is that it's easy to create a suitable power of two array that should always bypass any special case unaligned code (e.g. with floats, any multiple of 4 array length will fill every 16-byte word). Finally, I think there is significant value in auto-aligning the array based on an appropriate inspection of the cpu capabilities (or alternatively, a function that reports back the appropriate SIMD alignment). Again, this makes it easier to wrap libraries that may function with any alignment, but benefit from optimum alignment. Cheers, Henry