[Numpy-discussion] Scipy dot
Matthieu Brucher
matthieu.brucher at gmail.com
Fri Nov 9 17:57:10 EST 2012
Hi,
A.A slower than A.A' is not a surprise for me. The latter is far more cache
friendly that the former. Everything follows cache lines, so it is faster
than something that will use one element from each cache line. In fact it
is exactly what "proves" that the new version is correct.
Good job (if all the tests were made and still pass ;) )
Cheers,
Matthieu
2012/11/9 Nicolas SCHEFFER <scheffer.nicolas at gmail.com>
> Ok: comparing apples to apples. I'm clueless on my observations and
> would need input from you guys.
>
> Using ATLAS 3.10, numpy with and without my changes, I'm getting these
> timings and comparisons.
>
> #
> #I. Generate matrices using regular dot:
> #
> big = np.array(np.random.randn(2000, 2000), 'f');
> np.savez('out', big=big, none=big.dot(big), both=big.T.dot(big.T),
> left=big.T.dot(big), right=big.dot(big.T))"
>
> #
> #II. Timings with regular dot
> #
> In [3]: %timeit np.dot(big, big)
> 10 loops, best of 3: 138 ms per loop
>
> In [4]: %timeit np.dot(big, big.T)
> 10 loops, best of 3: 166 ms per loop
>
> In [5]: %timeit np.dot(big.T, big.T)
> 10 loops, best of 3: 193 ms per loop
>
> In [6]: %timeit np.dot(big.T, big)
> 10 loops, best of 3: 165 ms per loop
>
> #
> #III. I load these arrays and time again with the "fast" dot
> #
> In [21]: %timeit np.dot(big, big)
> 10 loops, best of 3: 138 ms per loop
>
> In [22]: %timeit np.dot(big.T, big)
> 10 loops, best of 3: 104 ms per loop
>
> In [23]: %timeit np.dot(big.T, big.T)
> 10 loops, best of 3: 138 ms per loop
>
> In [24]: %timeit np.dot(big, big.T)
> 10 loops, best of 3: 102 ms per loop
>
> 1. A'.A': great!
> 2. A.A' becomes faster than A.A !?!
>
> #
> #IV. MSE on differences
> #
> In [25]: np.sqrt(((arr['none'] - none)**2).sum())
> Out[25]: 0.0
>
> In [26]: np.sqrt(((arr['both'] - both)**2).sum())
> Out[26]: 0.0
>
> In [27]: np.sqrt(((arr['left'] - left)**2).sum())
> Out[27]: 0.015900515
>
> In [28]: np.sqrt(((arr['right'] - right)**2).sum())
> Out[28]: 0.015331409
>
> #
> # CCl
> #
> While the MSE are small, I'm wondering whether:
> - It's a bug: it should be exactly the same
> - It's a feature: BLAS is taking shortcuts when you have A.A'. The
> difference is not significant. Quick: PR that asap!
>
> I don't have enough expertise to answer that...
>
> Thanks much!
>
> -nicolas
> On Fri, Nov 9, 2012 at 2:13 PM, Nicolas SCHEFFER
> <scheffer.nicolas at gmail.com> wrote:
> > I too encourage users to use scipy.linalg for speed and robustness
> > (hence calling this scipy.dot), but it just brings so much confusion!
> > When using the scipy + numpy ecosystem, you'd almost want everything
> > be done with scipy so that you get the best implementation in all
> > cases: scipy.zeros(), scipy.array(), scipy.dot(), scipy.linalg.inv().
> >
> > Anyway this is indeed for another thread, the confusion we'd like to
> > fix here is that users shouldn't have to understand the C/F contiguous
> > concepts to get the maximum speed for np.dot()
> >
> > To summarize:
> > - The python snippet I posted is still valid and can speed up your
> > code if you can change all your dot() calls.
> > - The change in dotblas.c is a bit more problematic because it's very
> > core. I'm having issues right now to replicate the timings, I've got
> > better timing for a.dot(a.T) than for a.dot(a). There might be a bug.
> >
> > It's a pain to test because I cannot do the test in a single python
> session.
> > I'm going to try to integrate most of your suggestions, I cannot
> > guarantee I'll have time to do them all though.
> >
> > -nicolas
> > On Fri, Nov 9, 2012 at 8:56 AM, Nathaniel Smith <njs at pobox.com> wrote:
> >> On Fri, Nov 9, 2012 at 4:25 PM, Gael Varoquaux
> >> <gael.varoquaux at normalesup.org> wrote:
> >>> On Fri, Nov 09, 2012 at 03:12:42PM +0000, Nathaniel Smith wrote:
> >>>> But what if someone compiles numpy against an optimized blas (mkl,
> >>>> say) and then compiles SciPy against the reference blas? What do you
> >>>> do then!? ;-)
> >>>
> >>> This could happen. But the converse happens very often. What happens is
> >>> that users (eg on shared computing resource) ask for a scientific
> python
> >>> environment. The administrator than installs the package starting from
> >>> the most basic one, to the most advanced one, thus starting with numpy
> >>> that can very well build without any external blas. When he gets to
> scipy
> >>> he hits the problem that the build system does not detect properly the
> >>> blas, and he solves that problem.
> >>>
> >>> Also, it used to be that on the major linux distributions, numpy would
> not
> >>> be build with an optimize lapack because numpy was in the 'base' set of
> >>> packages, but not lapack. On the contrary, scipy being in the 'contrib'
> >>> set, it could depend on lapack. I just checked, and this has been fixed
> >>> in the major distributions (Fedora, Debian, Ubuntu).
> >>>
> >>> Now we can discuss with such problems should not happen, and put the
> >>> blame on the users/administrators, the fact is that they happen often.
> I
> >>> keep seeing environments in which np.linalg is unreasonnably slow.
> >>
> >> If this is something that's been a problem for you, maybe we should
> >> start another thread on things we could do to fix it directly? Improve
> >> build instructions, advertise build systems that set up the whole
> >> environment (and thus do the right thing), make numpy's setup.py
> >> scream and yell if blas isn't available...?
> >>
> >> -n
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20121109/162caeba/attachment.html>
More information about the NumPy-Discussion
mailing list