On Wed, 2012-12-19 at 19:03 +0100, Francesc Alted wrote:
The only scenario that I see that this would create unaligned arrays is for machines having AVX. But provided that the Intel architecture is making great strides in fetching unaligned data, I'd be surprised that the difference in performance would be even noticeable.
Can you tell us which difference in performance are you seeing for an AVX-aligned array and other that is not AVX-aligned? Just curious.
Further to this point, from an Intel article... http://software.intel.com/en-us/articles/practical-intel-avx-optimization-on... "Aligning data to vector length is always recommended. When using Intel SSE and Intel SSE2 instructions, loaded data should be aligned to 16 bytes. Similarly, to achieve best results use Intel AVX instructions on 32-byte vectors that are 32-byte aligned. The use of Intel AVX instructions on unaligned 32-byte vectors means that every second load will be across a cache-line split, since the cache line is 64 bytes. This doubles the cache line split rate compared to Intel SSE code that uses 16-byte vectors. A high cache-line split rate in memory-intensive code is extremely likely to cause performance degradation. For that reason, it is highly recommended to align the data to 32 bytes for use with Intel AVX." Though it would be nice to put together a little example of this! Henry