<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Feb 4, 2016 at 11:42 AM, Antoine Pitrou <span dir="ltr"><<a href="mailto:solipsis@pitrou.net" target="_blank">solipsis@pitrou.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Thu, 4 Feb 2016 21:22:32 +1000<br>

Nick Coghlan <<a href="mailto:ncoghlan@gmail.com">ncoghlan@gmail.com</a>> wrote:<br>

><br>

> I figured that was independent of the manylinux PEP (since it affects<br>

> Windows as well), but I'm also curious as to the current status (I<br>

> found a couple of apparently relevant threads on the NumPy list, but<br>

> figured it made more sense to just ask for an update rather than<br>

> trusting my Google-fu)<br>

<br>

</span>While I'm not a Numpy maintainer, I don't think you can go much further<br>

than SSE2 (which is standard under the x86-64 definition).<br>

<br>

One factor is support by the kernel. The CentOS 5 kernel doesn't<br>

seem to support AVX, so you can't use AVX there even if your processor<br>

supports it (as the registers aren't preserved accross context<br>

switches). And one design point of manylinux is to support old Linux<br>

setups... (*)<br></blockquote><div><br></div><div>I don't have precise numbers, but I can confirm we get from times to times some customer reports related to avx not being supported (because of CPU or OS).<br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

There are intermediate ISA additions between SSE2 and AVX (additions<br>

that don't require OS support), but I'm not sure they help much on<br>

compiler-vectorized code as opposed to hand-written assembly.  Numpy's<br>

pre-compiled loops are typically quite straightforward as far as I've<br>

seen.<br>

<br>

One mitigation is to delegate some operations to an optimized library<br>

implementing the appropriate runtime switches: for example linear<br>

algebra is delegated by Numpy and Scipy to optimized BLAS and LINPACK<br>

libraries (which exist in various implementations such as OpenBLAS or<br>

Intel's MKL).<br>

<br>

(*) (this is an issue a JIT compiler helps circumvent: it generates<br>

optimal code for the current CPU ;-))<br>

<br>

Regards<br>

<span class="HOEnZb"><font color="#888888"><br>

Antoine.<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

<br>

_______________________________________________<br>

Distutils-SIG maillist  -  <a href="mailto:Distutils-SIG@python.org">Distutils-SIG@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/distutils-sig" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/distutils-sig</a><br>

</div></div></blockquote></div><br></div></div>