Note also that OpenBlas claims performance as good as MKL with Sandy Bridge processors.<div><br><div><a href="https://github.com/xianyi/OpenBLAS/wiki/faq#wiki-sandybridge_perf">https://github.com/xianyi/OpenBLAS/wiki/faq#wiki-sandybridge_perf</a></div>

<div><br></div><div>George Nurser.</div><div><br><br><div class="gmail_quote">On 10 November 2012 00:38, Dag Sverre Seljebotn <span dir="ltr"><<a href="mailto:d.s.seljebotn@astro.uio.no" target="_blank">d.s.seljebotn@astro.uio.no</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 11/09/2012 11:57 PM, Matthieu Brucher wrote:<br>

> Hi,<br>

><br>

> A.A slower than A.A' is not a surprise for me. The latter is far more<br>

> cache friendly that the former. Everything follows cache lines, so it is<br>

> faster than something that will use one element from each cache line. In<br>

> fact it is exactly what "proves" that the new version is correct.<br>

> Good job (if all the tests were made and still pass ;) )<br>

<br>

</div>Cache lines shouldn't matter much with a decent BLAS?<br>

<br>

<a href="http://dl.acm.org/citation.cfm?id=1356053" target="_blank">http://dl.acm.org/citation.cfm?id=1356053</a><br>

<br>

(Googling for "Anatomy of High-Performance Matrix Multiplication" will<br>

give you a preprint outside of paywall, but Google appears to not want<br>

to give me the URL of a too long search result so I can't paste it).<br>

<br>

Dag Sverre<br>

<div class="im"><br>

><br>

> Cheers,<br>

><br>

> Matthieu<br>

><br>

><br>

> 2012/11/9 Nicolas SCHEFFER <<a href="mailto:scheffer.nicolas@gmail.com">scheffer.nicolas@gmail.com</a><br>

</div>> <mailto:<a href="mailto:scheffer.nicolas@gmail.com">scheffer.nicolas@gmail.com</a>>><br>

<div><div class="h5">><br>

>     Ok: comparing apples to apples. I'm clueless on my observations and<br>

>     would need input from you guys.<br>

><br>

>     Using ATLAS 3.10, numpy with and without my changes, I'm getting these<br>

>     timings and comparisons.<br>

><br>

>     #<br>

>     #I. Generate matrices using regular dot:<br>

>     #<br>

>     big = np.array(np.random.randn(2000, 2000), 'f');<br>

>     np.savez('out', big=big, none=big.dot(big), both=big.T.dot(big.T),<br>

>     left=big.T.dot(big), right=big.dot(big.T))"<br>

><br>

>     #<br>

>     #II. Timings with regular dot<br>

>     #<br>

>     In [3]: %timeit np.dot(big, big)<br>

>     10 loops, best of 3: 138 ms per loop<br>

><br>

>     In [4]: %timeit np.dot(big, big.T)<br>

>     10 loops, best of 3: 166 ms per loop<br>

><br>

>     In [5]: %timeit np.dot(big.T, big.T)<br>

>     10 loops, best of 3: 193 ms per loop<br>

><br>

>     In [6]: %timeit np.dot(big.T, big)<br>

>     10 loops, best of 3: 165 ms per loop<br>

><br>

>     #<br>

>     #III. I load these arrays and time again with the "fast" dot<br>

>     #<br>

>     In [21]: %timeit np.dot(big, big)<br>

>     10 loops, best of 3: 138 ms per loop<br>

><br>

>     In [22]: %timeit np.dot(big.T, big)<br>

>     10 loops, best of 3: 104 ms per loop<br>

><br>

>     In [23]: %timeit np.dot(big.T, big.T)<br>

>     10 loops, best of 3: 138 ms per loop<br>

><br>

>     In [24]: %timeit np.dot(big, big.T)<br>

>     10 loops, best of 3: 102 ms per loop<br>

><br>

>     1. A'.A': great!<br>

>     2. A.A' becomes faster than A.A !?!<br>

><br>

>     #<br>

>     #IV. MSE on differences<br>

>     #<br>

>     In [25]: np.sqrt(((arr['none'] - none)**2).sum())<br>

>     Out[25]: 0.0<br>

><br>

>     In [26]: np.sqrt(((arr['both'] - both)**2).sum())<br>

>     Out[26]: 0.0<br>

><br>

>     In [27]: np.sqrt(((arr['left'] - left)**2).sum())<br>

>     Out[27]: 0.015900515<br>

><br>

>     In [28]: np.sqrt(((arr['right'] - right)**2).sum())<br>

>     Out[28]: 0.015331409<br>

><br>

>     #<br>

>     # CCl<br>

>     #<br>

>     While the MSE are small, I'm wondering whether:<br>

>     - It's a bug: it should be exactly the same<br>

>     - It's a feature: BLAS is taking shortcuts when you have A.A'. The<br>

>     difference is not significant. Quick: PR that asap!<br>

><br>

>     I don't have enough expertise to answer that...<br>

><br>

>     Thanks much!<br>

><br>

>     -nicolas<br>

>     On Fri, Nov 9, 2012 at 2:13 PM, Nicolas SCHEFFER<br>

</div></div><div class="im">>     <<a href="mailto:scheffer.nicolas@gmail.com">scheffer.nicolas@gmail.com</a> <mailto:<a href="mailto:scheffer.nicolas@gmail.com">scheffer.nicolas@gmail.com</a>>> wrote:<br>


>      > I too encourage users to use scipy.linalg for speed and robustness<br>

>      > (hence calling this scipy.dot), but it just brings so much confusion!<br>

>      > When using the scipy + numpy ecosystem, you'd almost want everything<br>

>      > be done with scipy so that you get the best implementation in all<br>

>      > cases: scipy.zeros(), scipy.array(), scipy.dot(), scipy.linalg.inv().<br>

>      ><br>

>      > Anyway this is indeed for another thread, the confusion we'd like to<br>

>      > fix here is that users shouldn't have to understand the C/F<br>

>     contiguous<br>

>      > concepts to get the maximum speed for np.dot()<br>

>      ><br>

>      > To summarize:<br>

>      > - The python snippet I posted is still valid and can speed up your<br>

>      > code if you can change all your dot() calls.<br>

>      > - The change in dotblas.c is a bit more problematic because it's very<br>

>      > core. I'm having issues right now to replicate the timings, I've got<br>

>      > better timing for a.dot(a.T) than for a.dot(a). There might be a bug.<br>

>      ><br>

>      > It's a pain to test because I cannot do the test in a single<br>

>     python session.<br>

>      > I'm going to try to integrate most of your suggestions, I cannot<br>

>      > guarantee I'll have time to do them all though.<br>

>      ><br>

>      > -nicolas<br>

>      > On Fri, Nov 9, 2012 at 8:56 AM, Nathaniel Smith <<a href="mailto:njs@pobox.com">njs@pobox.com</a><br>

</div><div class="im">>     <mailto:<a href="mailto:njs@pobox.com">njs@pobox.com</a>>> wrote:<br>

>      >> On Fri, Nov 9, 2012 at 4:25 PM, Gael Varoquaux<br>

>      >> <<a href="mailto:gael.varoquaux@normalesup.org">gael.varoquaux@normalesup.org</a><br>

</div><div><div class="h5">>     <mailto:<a href="mailto:gael.varoquaux@normalesup.org">gael.varoquaux@normalesup.org</a>>> wrote:<br>

>      >>> On Fri, Nov 09, 2012 at 03:12:42PM +0000, Nathaniel Smith wrote:<br>

>      >>>> But what if someone compiles numpy against an optimized blas (mkl,<br>

>      >>>> say) and then compiles SciPy against the reference blas? What<br>

>     do you<br>

>      >>>> do then!? ;-)<br>

>      >>><br>

>      >>> This could happen. But the converse happens very often. What<br>

>     happens is<br>

>      >>> that users (eg on shared computing resource) ask for a<br>

>     scientific python<br>

>      >>> environment. The administrator than installs the package<br>

>     starting from<br>

>      >>> the most basic one, to the most advanced one, thus starting<br>

>     with numpy<br>

>      >>> that can very well build without any external blas. When he<br>

>     gets to scipy<br>

>      >>> he hits the problem that the build system does not detect<br>

>     properly the<br>

>      >>> blas, and he solves that problem.<br>

>      >>><br>

>      >>> Also, it used to be that on the major linux distributions,<br>

>     numpy would not<br>

>      >>> be build with an optimize lapack because numpy was in the<br>

>     'base' set of<br>

>      >>> packages, but not lapack. On the contrary, scipy being in the<br>

>     'contrib'<br>

>      >>> set, it could depend on lapack. I just checked, and this has<br>

>     been fixed<br>

>      >>> in the major distributions (Fedora, Debian, Ubuntu).<br>

>      >>><br>

>      >>> Now we can discuss with such problems should not happen, and<br>

>     put the<br>

>      >>> blame on the users/administrators, the fact is that they happen<br>

>     often. I<br>

>      >>> keep seeing environments in which np.linalg is unreasonnably slow.<br>

>      >><br>

>      >> If this is something that's been a problem for you, maybe we should<br>

>      >> start another thread on things we could do to fix it directly?<br>

>     Improve<br>

>      >> build instructions, advertise build systems that set up the whole<br>

>      >> environment (and thus do the right thing), make numpy's setup.py<br>

>      >> scream and yell if blas isn't available...?<br>

>      >><br>

>      >> -n<br>

>      >> _______________________________________________<br>

>      >> NumPy-Discussion mailing list<br>

</div></div>>      >> <a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a> <mailto:<a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a>><br>

<div class="im">>      >> <a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>

>     _______________________________________________<br>

>     NumPy-Discussion mailing list<br>

</div>>     <a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a> <mailto:<a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a>><br>

<div class="HOEnZb"><div class="h5">>     <a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>

><br>

><br>

><br>

><br>

> --<br>

> Information System Engineer, Ph.D.<br>

> Blog: <a href="http://matt.eifelle.com" target="_blank">http://matt.eifelle.com</a><br>

> LinkedIn: <a href="http://www.linkedin.com/in/matthieubrucher" target="_blank">http://www.linkedin.com/in/matthieubrucher</a><br>

> Music band: <a href="http://liliejay.com/" target="_blank">http://liliejay.com/</a><br>

><br>

><br>

><br>

> _______________________________________________<br>

> NumPy-Discussion mailing list<br>

> <a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br>

> <a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>

><br>

<br>

_______________________________________________<br>

NumPy-Discussion mailing list<br>

<a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br>

<a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>

</div></div></blockquote></div><br></div></div>