[Numpy-discussion] Byte aligned arrays

Henry Gomersall heng at cantab.net
Thu Dec 20 13:35:20 EST 2012

On Thu, 2012-12-20 at 15:23 +0100, Francesc Alted wrote:
> On 12/20/12 9:53 AM, Henry Gomersall wrote:
> > On Wed, 2012-12-19 at 19:03 +0100, Francesc Alted wrote:
> >> The only scenario that I see that this would create unaligned
> arrays
> >> is
> >> for machines having AVX.  But provided that the Intel architecture
> is
> >> making great strides in fetching unaligned data, I'd be surprised
> >> that
> >> the difference in performance would be even noticeable.
> >>
> >> Can you tell us which difference in performance are you seeing for
> an
> >> AVX-aligned array and other that is not AVX-aligned?  Just curious.
> > Further to this point, from an Intel article...
> >
> >
> http://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors
> >
> > "Aligning data to vector length is always recommended. When using
> Intel
> > SSE and Intel SSE2 instructions, loaded data should be aligned to 16
> > bytes. Similarly, to achieve best results use Intel AVX instructions
> on
> > 32-byte vectors that are 32-byte aligned. The use of Intel AVX
> > instructions on unaligned 32-byte vectors means that every second
> load
> > will be across a cache-line split, since the cache line is 64 bytes.
> > This doubles the cache line split rate compared to Intel SSE code
> that
> > uses 16-byte vectors. A high cache-line split rate in
> memory-intensive
> > code is extremely likely to cause performance degradation. For that
> > reason, it is highly recommended to align the data to 32 bytes for
> use
> > with Intel AVX."
> >
> > Though it would be nice to put together a little example of this!
> Indeed, an example is what I was looking for.  So provided that I
> have 
> access to an AVX capable machine (having 6 physical cores), and that
> MKL 
> 10.3 has support for AVX, I have made some comparisons using the 
> Anaconda Python distribution (it ships with most packages linked
> against 
> MKL 10.3).


> All in all, it is not clear that AVX alignment would have an
> advantage, 
> even for memory-bounded problems.  But of course, if Intel people are 
> saying that AVX alignment is important is because they have use cases 
> for asserting this.  It is just that I'm having a difficult time to
> find 
> these cases.

Thanks for those examples, they were very interesting. I managed to
temporarily get my hands on a machine with AVX and I have shown some
speed-up with aligned arrays.

FFT (using my wrappers) gives about a 15% speedup.

Also this convolution code:

Shows a small but repeatable speed-up (a few %) when using some aligned
loads (as many as I can work out to use!).



More information about the NumPy-Discussion mailing list