[Numpy-discussion] Scipy dot

Nicolas SCHEFFER scheffer.nicolas at gmail.com
Fri Nov 9 17:52:22 EST 2012

```Ok: comparing apples to apples. I'm clueless on my observations and
would need input from you guys.

Using ATLAS 3.10, numpy with and without my changes, I'm getting these
timings and comparisons.

#
#I. Generate matrices using regular dot:
#
big = np.array(np.random.randn(2000, 2000), 'f');
np.savez('out', big=big, none=big.dot(big), both=big.T.dot(big.T),
left=big.T.dot(big), right=big.dot(big.T))"

#
#II. Timings with regular dot
#
In [3]: %timeit np.dot(big, big)
10 loops, best of 3: 138 ms per loop

In [4]: %timeit np.dot(big, big.T)
10 loops, best of 3: 166 ms per loop

In [5]: %timeit np.dot(big.T, big.T)
10 loops, best of 3: 193 ms per loop

In [6]: %timeit np.dot(big.T, big)
10 loops, best of 3: 165 ms per loop

#
#III. I load these arrays and time again with the "fast" dot
#
In [21]: %timeit np.dot(big, big)
10 loops, best of 3: 138 ms per loop

In [22]: %timeit np.dot(big.T, big)
10 loops, best of 3: 104 ms per loop

In [23]: %timeit np.dot(big.T, big.T)
10 loops, best of 3: 138 ms per loop

In [24]: %timeit np.dot(big, big.T)
10 loops, best of 3: 102 ms per loop

1. A'.A': great!
2. A.A' becomes faster than A.A !?!

#
#IV. MSE on differences
#
In [25]: np.sqrt(((arr['none'] - none)**2).sum())
Out[25]: 0.0

In [26]: np.sqrt(((arr['both'] - both)**2).sum())
Out[26]: 0.0

In [27]: np.sqrt(((arr['left'] - left)**2).sum())
Out[27]: 0.015900515

In [28]: np.sqrt(((arr['right'] - right)**2).sum())
Out[28]: 0.015331409

#
# CCl
#
While the MSE are small, I'm wondering whether:
- It's a bug: it should be exactly the same
- It's a feature: BLAS is taking shortcuts when you have A.A'. The
difference is not significant. Quick: PR that asap!

I don't have enough expertise to answer that...

Thanks much!

-nicolas
On Fri, Nov 9, 2012 at 2:13 PM, Nicolas SCHEFFER
<scheffer.nicolas at gmail.com> wrote:
> I too encourage users to use scipy.linalg for speed and robustness
> (hence calling this scipy.dot), but it just brings so much confusion!
> When using the scipy + numpy ecosystem, you'd almost want everything
> cases: scipy.zeros(), scipy.array(), scipy.dot(), scipy.linalg.inv().
>
> Anyway this is indeed for another thread, the confusion we'd like to
> fix here is that users shouldn't have to understand the C/F contiguous
> concepts to get the maximum speed for np.dot()
>
> To summarize:
> - The python snippet I posted is still valid and can speed up your
> code if you can change all your dot() calls.
> - The change in dotblas.c is a bit more problematic because it's very
> core. I'm having issues right now to replicate the timings, I've got
> better timing for a.dot(a.T) than for a.dot(a). There might be a bug.
>
> It's a pain to test because I cannot do the test in a single python session.
> I'm going to try to integrate most of your suggestions, I cannot
> guarantee I'll have time to do them all though.
>
> -nicolas
> On Fri, Nov 9, 2012 at 8:56 AM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Fri, Nov 9, 2012 at 4:25 PM, Gael Varoquaux
>> <gael.varoquaux at normalesup.org> wrote:
>>> On Fri, Nov 09, 2012 at 03:12:42PM +0000, Nathaniel Smith wrote:
>>>> But what if someone compiles numpy against an optimized blas (mkl,
>>>> say) and then compiles SciPy against the reference blas? What do you
>>>> do then!? ;-)
>>>
>>> This could happen. But the converse happens very often. What happens is
>>> that users (eg on shared computing resource) ask for a scientific python
>>> environment. The administrator than installs the package starting from
>>> the most basic one, to the most advanced one, thus starting with numpy
>>> that can very well build without any external blas. When he gets to scipy
>>> he hits the problem that the build system does not detect properly the
>>> blas, and he solves that problem.
>>>
>>> Also, it used to be that on the major linux distributions, numpy would not
>>> be build with an optimize lapack because numpy was in the 'base' set of
>>> packages, but not lapack. On the contrary, scipy being in the 'contrib'
>>> set, it could depend on lapack. I just checked, and this has been fixed
>>> in the major distributions (Fedora, Debian, Ubuntu).
>>>
>>> Now we can discuss with such problems should not happen, and put the
>>> blame on the users/administrators, the fact is that they happen often. I
>>> keep seeing environments in which np.linalg is unreasonnably slow.
>>
>> If this is something that's been a problem for you, maybe we should
>> start another thread on things we could do to fix it directly? Improve
>> build instructions, advertise build systems that set up the whole
>> environment (and thus do the right thing), make numpy's setup.py
>> scream and yell if blas isn't available...?
>>
>> -n
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion

```