On Wed, 2012-12-19 at 19:03 +0100, Francesc Alted wrote: <snip>
Finally, I think there is significant value in auto-aligning the array based on an appropriate inspection of the cpu capabilities (or alternatively, a function that reports back the appropriate SIMD alignment). Again, this makes it easier to wrap libraries that may function with any alignment, but benefit from optimum alignment.
Hmm, NumPy seems to return data blocks that are aligned to 16 bytes on systems (Linux and Mac OSX): <snip>
That is not true at least under Windows 32-bit. I think also it's not true for Linux 32-bit from my vague recollections of testing in a virtual machine. (disclaimer: both those statements _may_ be out of date). But yes, under Linux 64-bit I always get my arrays aligned to 16 bytes.
The only scenario that I see that this would create unaligned arrays is for machines having AVX. But provided that the Intel architecture is making great strides in fetching unaligned data, I'd be surprised that the difference in performance would be even noticeable.
Can you tell us which difference in performance are you seeing for an AVX-aligned array and other that is not AVX-aligned? Just curious.
I don't know; I don't own a machine with AVX ;) It might be that the difference is negligible, though I do think it would be _nice_ to have the arrays properly aligned if it's not too difficult. Cheers, Henry