Note also that OpenBlas claims performance as good as MKL with Sandy Bridge processors.<div><br><div><a href="https://github.com/xianyi/OpenBLAS/wiki/faq#wiki-sandybridge_perf">https://github.com/xianyi/OpenBLAS/wiki/faq#wiki-sandybridge_perf</a></div>
<div><br></div><div>George Nurser.</div><div><br><br><div class="gmail_quote">On 10 November 2012 00:38, Dag Sverre Seljebotn <span dir="ltr"><<a href="mailto:d.s.seljebotn@astro.uio.no" target="_blank">d.s.seljebotn@astro.uio.no</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 11/09/2012 11:57 PM, Matthieu Brucher wrote:<br>
> Hi,<br>
><br>
> A.A slower than A.A' is not a surprise for me. The latter is far more<br>
> cache friendly that the former. Everything follows cache lines, so it is<br>
> faster than something that will use one element from each cache line. In<br>
> fact it is exactly what "proves" that the new version is correct.<br>
> Good job (if all the tests were made and still pass ;) )<br>
<br>
</div>Cache lines shouldn't matter much with a decent BLAS?<br>
<br>
<a href="http://dl.acm.org/citation.cfm?id=1356053" target="_blank">http://dl.acm.org/citation.cfm?id=1356053</a><br>
<br>
(Googling for "Anatomy of High-Performance Matrix Multiplication" will<br>
give you a preprint outside of paywall, but Google appears to not want<br>
to give me the URL of a too long search result so I can't paste it).<br>
<br>
Dag Sverre<br>
<div class="im"><br>
><br>
> Cheers,<br>
><br>
> Matthieu<br>
><br>
><br>
> 2012/11/9 Nicolas SCHEFFER <<a href="mailto:scheffer.nicolas@gmail.com">scheffer.nicolas@gmail.com</a><br>
</div>> <mailto:<a href="mailto:scheffer.nicolas@gmail.com">scheffer.nicolas@gmail.com</a>>><br>
<div><div class="h5">><br>
> Ok: comparing apples to apples. I'm clueless on my observations and<br>
> would need input from you guys.<br>
><br>
> Using ATLAS 3.10, numpy with and without my changes, I'm getting these<br>
> timings and comparisons.<br>
><br>
> #<br>
> #I. Generate matrices using regular dot:<br>
> #<br>
> big = np.array(np.random.randn(2000, 2000), 'f');<br>
> np.savez('out', big=big, none=big.dot(big), both=big.T.dot(big.T),<br>
> left=big.T.dot(big), right=big.dot(big.T))"<br>
><br>
> #<br>
> #II. Timings with regular dot<br>
> #<br>
> In [3]: %timeit np.dot(big, big)<br>
> 10 loops, best of 3: 138 ms per loop<br>
><br>
> In [4]: %timeit np.dot(big, big.T)<br>
> 10 loops, best of 3: 166 ms per loop<br>
><br>
> In [5]: %timeit np.dot(big.T, big.T)<br>
> 10 loops, best of 3: 193 ms per loop<br>
><br>
> In [6]: %timeit np.dot(big.T, big)<br>
> 10 loops, best of 3: 165 ms per loop<br>
><br>
> #<br>
> #III. I load these arrays and time again with the "fast" dot<br>
> #<br>
> In [21]: %timeit np.dot(big, big)<br>
> 10 loops, best of 3: 138 ms per loop<br>
><br>
> In [22]: %timeit np.dot(big.T, big)<br>
> 10 loops, best of 3: 104 ms per loop<br>
><br>
> In [23]: %timeit np.dot(big.T, big.T)<br>
> 10 loops, best of 3: 138 ms per loop<br>
><br>
> In [24]: %timeit np.dot(big, big.T)<br>
> 10 loops, best of 3: 102 ms per loop<br>
><br>
> 1. A'.A': great!<br>
> 2. A.A' becomes faster than A.A !?!<br>
><br>
> #<br>
> #IV. MSE on differences<br>
> #<br>
> In [25]: np.sqrt(((arr['none'] - none)**2).sum())<br>
> Out[25]: 0.0<br>
><br>
> In [26]: np.sqrt(((arr['both'] - both)**2).sum())<br>
> Out[26]: 0.0<br>
><br>
> In [27]: np.sqrt(((arr['left'] - left)**2).sum())<br>
> Out[27]: 0.015900515<br>
><br>
> In [28]: np.sqrt(((arr['right'] - right)**2).sum())<br>
> Out[28]: 0.015331409<br>
><br>
> #<br>
> # CCl<br>
> #<br>
> While the MSE are small, I'm wondering whether:<br>
> - It's a bug: it should be exactly the same<br>
> - It's a feature: BLAS is taking shortcuts when you have A.A'. The<br>
> difference is not significant. Quick: PR that asap!<br>
><br>
> I don't have enough expertise to answer that...<br>
><br>
> Thanks much!<br>
><br>
> -nicolas<br>
> On Fri, Nov 9, 2012 at 2:13 PM, Nicolas SCHEFFER<br>
</div></div><div class="im">> <<a href="mailto:scheffer.nicolas@gmail.com">scheffer.nicolas@gmail.com</a> <mailto:<a href="mailto:scheffer.nicolas@gmail.com">scheffer.nicolas@gmail.com</a>>> wrote:<br>
> > I too encourage users to use scipy.linalg for speed and robustness<br>
> > (hence calling this scipy.dot), but it just brings so much confusion!<br>
> > When using the scipy + numpy ecosystem, you'd almost want everything<br>
> > be done with scipy so that you get the best implementation in all<br>
> > cases: scipy.zeros(), scipy.array(), scipy.dot(), scipy.linalg.inv().<br>
> ><br>
> > Anyway this is indeed for another thread, the confusion we'd like to<br>
> > fix here is that users shouldn't have to understand the C/F<br>
> contiguous<br>
> > concepts to get the maximum speed for np.dot()<br>
> ><br>
> > To summarize:<br>
> > - The python snippet I posted is still valid and can speed up your<br>
> > code if you can change all your dot() calls.<br>
> > - The change in dotblas.c is a bit more problematic because it's very<br>
> > core. I'm having issues right now to replicate the timings, I've got<br>
> > better timing for a.dot(a.T) than for a.dot(a). There might be a bug.<br>
> ><br>
> > It's a pain to test because I cannot do the test in a single<br>
> python session.<br>
> > I'm going to try to integrate most of your suggestions, I cannot<br>
> > guarantee I'll have time to do them all though.<br>
> ><br>
> > -nicolas<br>
> > On Fri, Nov 9, 2012 at 8:56 AM, Nathaniel Smith <<a href="mailto:njs@pobox.com">njs@pobox.com</a><br>
</div><div class="im">> <mailto:<a href="mailto:njs@pobox.com">njs@pobox.com</a>>> wrote:<br>
> >> On Fri, Nov 9, 2012 at 4:25 PM, Gael Varoquaux<br>
> >> <<a href="mailto:gael.varoquaux@normalesup.org">gael.varoquaux@normalesup.org</a><br>
</div><div><div class="h5">> <mailto:<a href="mailto:gael.varoquaux@normalesup.org">gael.varoquaux@normalesup.org</a>>> wrote:<br>
> >>> On Fri, Nov 09, 2012 at 03:12:42PM +0000, Nathaniel Smith wrote:<br>
> >>>> But what if someone compiles numpy against an optimized blas (mkl,<br>
> >>>> say) and then compiles SciPy against the reference blas? What<br>
> do you<br>
> >>>> do then!? ;-)<br>
> >>><br>
> >>> This could happen. But the converse happens very often. What<br>
> happens is<br>
> >>> that users (eg on shared computing resource) ask for a<br>
> scientific python<br>
> >>> environment. The administrator than installs the package<br>
> starting from<br>
> >>> the most basic one, to the most advanced one, thus starting<br>
> with numpy<br>
> >>> that can very well build without any external blas. When he<br>
> gets to scipy<br>
> >>> he hits the problem that the build system does not detect<br>
> properly the<br>
> >>> blas, and he solves that problem.<br>
> >>><br>
> >>> Also, it used to be that on the major linux distributions,<br>
> numpy would not<br>
> >>> be build with an optimize lapack because numpy was in the<br>
> 'base' set of<br>
> >>> packages, but not lapack. On the contrary, scipy being in the<br>
> 'contrib'<br>
> >>> set, it could depend on lapack. I just checked, and this has<br>
> been fixed<br>
> >>> in the major distributions (Fedora, Debian, Ubuntu).<br>
> >>><br>
> >>> Now we can discuss with such problems should not happen, and<br>
> put the<br>
> >>> blame on the users/administrators, the fact is that they happen<br>
> often. I<br>
> >>> keep seeing environments in which np.linalg is unreasonnably slow.<br>
> >><br>
> >> If this is something that's been a problem for you, maybe we should<br>
> >> start another thread on things we could do to fix it directly?<br>
> Improve<br>
> >> build instructions, advertise build systems that set up the whole<br>
> >> environment (and thus do the right thing), make numpy's setup.py<br>
> >> scream and yell if blas isn't available...?<br>
> >><br>
> >> -n<br>
> >> _______________________________________________<br>
> >> NumPy-Discussion mailing list<br>
</div></div>> >> <a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a> <mailto:<a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a>><br>
<div class="im">> >> <a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>
> _______________________________________________<br>
> NumPy-Discussion mailing list<br>
</div>> <a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a> <mailto:<a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a>><br>
<div class="HOEnZb"><div class="h5">> <a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>
><br>
><br>
><br>
><br>
> --<br>
> Information System Engineer, Ph.D.<br>
> Blog: <a href="http://matt.eifelle.com" target="_blank">http://matt.eifelle.com</a><br>
> LinkedIn: <a href="http://www.linkedin.com/in/matthieubrucher" target="_blank">http://www.linkedin.com/in/matthieubrucher</a><br>
> Music band: <a href="http://liliejay.com/" target="_blank">http://liliejay.com/</a><br>
><br>
><br>
><br>
> _______________________________________________<br>
> NumPy-Discussion mailing list<br>
> <a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br>
> <a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>
><br>
<br>
_______________________________________________<br>
NumPy-Discussion mailing list<br>
<a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br>
<a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>
</div></div></blockquote></div><br></div></div>