[Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

Francesc Alted faltet at gmail.com
Thu Apr 17 12:06:56 EDT 2014


Uh, 15x slower for unaligned access is quite a lot.  But Intel (and AMD) 
arquitectures are much more tolerant in this aspect (and improving).  
For example, with a Xeon(R) CPU E5-2670 (2 years old) I get:

In [1]: import numpy as np

In [2]: shape = (10000, 10000)

In [3]: x_aligned = np.zeros(shape, 
dtype=[('x',np.float64),('y',np.int64)])['x']

In [4]: x_unaligned = np.zeros(shape, 
dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x']

In [5]: %timeit res = x_aligned ** 2
1 loops, best of 3: 289 ms per loop

In [6]: %timeit res = x_unaligned ** 2
1 loops, best of 3: 664 ms per loop

so the added cost in this case is just a bit more than 2x.  But you can 
also alleviate this overhead if you do a copy that fits in cache prior 
to do computations.  numexpr does this:

https://github.com/pydata/numexpr/blob/master/numexpr/interp_body.cpp#L203

and the results are pretty good:

In [8]: import numexpr as ne

In [9]: %timeit res = ne.evaluate('x_aligned ** 2')
10 loops, best of 3: 133 ms per loop

In [10]: %timeit res = ne.evaluate('x_unaligned ** 2')
10 loops, best of 3: 134 ms per loop

i.e. there is not a significant difference between aligned and unaligned 
access to data.

I wonder if the same technique could be applied to NumPy.

Francesc


El 17/04/14 16:26, Aron Ahmadia ha escrit:
> Hmnn, I wasn't being clear :)
>
> The default malloc on BlueGene/Q only returns 8 byte alignment, but 
> the SIMD units need 32-byte alignment for loads, stores, and 
> operations or performance suffers.  On the /P the required alignment 
> was 16-bytes, but malloc only gave you 8, and trying to perform 
> vectorized loads/stores generated alignment exceptions on unaligned 
> memory.
>
> See https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q and 
> https://computing.llnl.gov/tutorials/bgp/BGP-usage.Walkup.pdf (slides 
> 14 for overview, 15 for the effective performance difference between 
> the unaligned/aligned code) for some notes on this.
>
> A
>
>
>
>
> On Thu, Apr 17, 2014 at 10:18 AM, Nathaniel Smith <njs at pobox.com 
> <mailto:njs at pobox.com>> wrote:
>
>     On 17 Apr 2014 15:09, "Aron Ahmadia" <aron at ahmadia.net
>     <mailto:aron at ahmadia.net>> wrote:
>     >
>     > > On the one hand it would be nice to actually know whether
>     posix_memalign is important, before making api decisions on this
>     basis.
>     >
>     > FWIW: On the lightweight IBM cores that the extremely popular
>     BlueGene machines were based on, accessing unaligned memory raised
>     system faults.  The default behavior of these machines was to
>     terminate the program if more than 1000 such errors occurred on a
>     given process, and an environment variable allowed you to
>     terminate the program if *any* unaligned memory access occurred.
>      This is because unaligned memory accesses were 15x (or more)
>     slower than aligned memory access.
>     >
>     > The newer /Q chips seem to be a little more forgiving of this,
>     but I think one can in general expect allocated memory alignment
>     to be an important performance technique for future high
>     performance computing architectures.
>
>     Right, this much is true on lots of architectures, and so malloc
>     is careful to always return values with sufficient alignment (e.g.
>     8 bytes) to make sure that any standard operation can succeed.
>
>     The question here is whether it will be important to have *even
>     more* alignment than malloc gives us by default. A 16 or 32 byte
>     wide SIMD instruction might prefer that data have 16 or 32 byte
>     alignment, even if normal memory access for the types being
>     operated on only requires 4 or 8 byte alignment.
>
>     -n
>
>
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>     http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Francesc Alted

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140417/c3d648d0/attachment.html>


More information about the NumPy-Discussion mailing list