[Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed
Francesc Alted
faltet at gmail.com
Thu Apr 17 12:06:56 EDT 2014
Uh, 15x slower for unaligned access is quite a lot. But Intel (and AMD)
arquitectures are much more tolerant in this aspect (and improving).
For example, with a Xeon(R) CPU E5-2670 (2 years old) I get:
In [1]: import numpy as np
In [2]: shape = (10000, 10000)
In [3]: x_aligned = np.zeros(shape,
dtype=[('x',np.float64),('y',np.int64)])['x']
In [4]: x_unaligned = np.zeros(shape,
dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x']
In [5]: %timeit res = x_aligned ** 2
1 loops, best of 3: 289 ms per loop
In [6]: %timeit res = x_unaligned ** 2
1 loops, best of 3: 664 ms per loop
so the added cost in this case is just a bit more than 2x. But you can
also alleviate this overhead if you do a copy that fits in cache prior
to do computations. numexpr does this:
https://github.com/pydata/numexpr/blob/master/numexpr/interp_body.cpp#L203
and the results are pretty good:
In [8]: import numexpr as ne
In [9]: %timeit res = ne.evaluate('x_aligned ** 2')
10 loops, best of 3: 133 ms per loop
In [10]: %timeit res = ne.evaluate('x_unaligned ** 2')
10 loops, best of 3: 134 ms per loop
i.e. there is not a significant difference between aligned and unaligned
access to data.
I wonder if the same technique could be applied to NumPy.
Francesc
El 17/04/14 16:26, Aron Ahmadia ha escrit:
> Hmnn, I wasn't being clear :)
>
> The default malloc on BlueGene/Q only returns 8 byte alignment, but
> the SIMD units need 32-byte alignment for loads, stores, and
> operations or performance suffers. On the /P the required alignment
> was 16-bytes, but malloc only gave you 8, and trying to perform
> vectorized loads/stores generated alignment exceptions on unaligned
> memory.
>
> See https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q and
> https://computing.llnl.gov/tutorials/bgp/BGP-usage.Walkup.pdf (slides
> 14 for overview, 15 for the effective performance difference between
> the unaligned/aligned code) for some notes on this.
>
> A
>
>
>
>
> On Thu, Apr 17, 2014 at 10:18 AM, Nathaniel Smith <njs at pobox.com
> <mailto:njs at pobox.com>> wrote:
>
> On 17 Apr 2014 15:09, "Aron Ahmadia" <aron at ahmadia.net
> <mailto:aron at ahmadia.net>> wrote:
> >
> > > On the one hand it would be nice to actually know whether
> posix_memalign is important, before making api decisions on this
> basis.
> >
> > FWIW: On the lightweight IBM cores that the extremely popular
> BlueGene machines were based on, accessing unaligned memory raised
> system faults. The default behavior of these machines was to
> terminate the program if more than 1000 such errors occurred on a
> given process, and an environment variable allowed you to
> terminate the program if *any* unaligned memory access occurred.
> This is because unaligned memory accesses were 15x (or more)
> slower than aligned memory access.
> >
> > The newer /Q chips seem to be a little more forgiving of this,
> but I think one can in general expect allocated memory alignment
> to be an important performance technique for future high
> performance computing architectures.
>
> Right, this much is true on lots of architectures, and so malloc
> is careful to always return values with sufficient alignment (e.g.
> 8 bytes) to make sure that any standard operation can succeed.
>
> The question here is whether it will be important to have *even
> more* alignment than malloc gives us by default. A 16 or 32 byte
> wide SIMD instruction might prefer that data have 16 or 32 byte
> alignment, even if normal memory access for the types being
> operated on only requires 4 or 8 byte alignment.
>
> -n
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Francesc Alted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140417/c3d648d0/attachment.html>
More information about the NumPy-Discussion
mailing list