[Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

Aron Ahmadia aron at ahmadia.net
Thu Apr 17 10:26:37 EDT 2014

Hmnn, I wasn't being clear :)

The default malloc on BlueGene/Q only returns 8 byte alignment, but the
SIMD units need 32-byte alignment for loads, stores, and operations or
performance suffers.  On the /P the required alignment was 16-bytes, but
malloc only gave you 8, and trying to perform vectorized loads/stores
generated alignment exceptions on unaligned memory.

See https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q and
https://computing.llnl.gov/tutorials/bgp/BGP-usage.Walkup.pdf (slides 14
for overview, 15 for the effective performance difference between the
unaligned/aligned code) for some notes on this.


On Thu, Apr 17, 2014 at 10:18 AM, Nathaniel Smith <njs at pobox.com> wrote:

> On 17 Apr 2014 15:09, "Aron Ahmadia" <aron at ahmadia.net> wrote:
> >
> > > On the one hand it would be nice to actually know whether
> posix_memalign is important, before making api decisions on this basis.
> >
> > FWIW: On the lightweight IBM cores that the extremely popular BlueGene
> machines were based on, accessing unaligned memory raised system faults.
>  The default behavior of these machines was to terminate the program if
> more than 1000 such errors occurred on a given process, and an environment
> variable allowed you to terminate the program if *any* unaligned memory
> access occurred.  This is because unaligned memory accesses were 15x (or
> more) slower than aligned memory access.
> >
> > The newer /Q chips seem to be a little more forgiving of this, but I
> think one can in general expect allocated memory alignment to be an
> important performance technique for future high performance computing
> architectures.
> Right, this much is true on lots of architectures, and so malloc is
> careful to always return values with sufficient alignment (e.g. 8 bytes) to
> make sure that any standard operation can succeed.
> The question here is whether it will be important to have *even more*
> alignment than malloc gives us by default. A 16 or 32 byte wide SIMD
> instruction might prefer that data have 16 or 32 byte alignment, even if
> normal memory access for the types being operated on only requires 4 or 8
> byte alignment.
> -n
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140417/6c0e3901/attachment.html>

More information about the NumPy-Discussion mailing list