[Numpy-discussion] Byte aligned arrays

Thu Dec 20 03:53:31 EST 2012

On Wed, 2012-12-19 at 19:03 +0100, Francesc Alted wrote:
> The only scenario that I see that this would create unaligned arrays
> is 
> for machines having AVX.  But provided that the Intel architecture is 
> making great strides in fetching unaligned data, I'd be surprised
> that 
> the difference in performance would be even noticeable.
> 
> Can you tell us which difference in performance are you seeing for an 
> AVX-aligned array and other that is not AVX-aligned?  Just curious.

Further to this point, from an Intel article...

http://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors

"Aligning data to vector length is always recommended. When using Intel
SSE and Intel SSE2 instructions, loaded data should be aligned to 16
bytes. Similarly, to achieve best results use Intel AVX instructions on
32-byte vectors that are 32-byte aligned. The use of Intel AVX
instructions on unaligned 32-byte vectors means that every second load
will be across a cache-line split, since the cache line is 64 bytes.
This doubles the cache line split rate compared to Intel SSE code that
uses 16-byte vectors. A high cache-line split rate in memory-intensive
code is extremely likely to cause performance degradation. For that
reason, it is highly recommended to align the data to 32 bytes for use
with Intel AVX."

Though it would be nice to put together a little example of this!

Henry