Re: [Numpy-discussion] Numpy 1.11.0b1 is out

Feb. 1, 2016


      
...
On 31 Jan 2016, at 9:48 am, Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Sa, 2016-01-30 at 20:27 +0100, Derek Homeier wrote:
...
On 27 Jan 2016, at 1:10 pm, Sebastian Berg <
sebastian@sipsolutions.net> wrote:
...
On Mi, 2016-01-27 at 11:19 +0000, Nadav Horesh wrote:
...
Why the dot function/method is slower than @ on python 3.5.1?
Tested
from the latest 1.11 maintenance branch.
The explanation I think is that you do not have a blas
optimization. In
which case the fallback mode is probably faster in the @ case
(since it
has SSE2 optimization by using einsum, while np.dot does not do
that).
I am a bit confused now, as A @ c is just short for A.__matmul__(c)
or equivalent
to np.matmul(A,c), so why would these not use the optimised blas?
Also, I am getting almost identical results on my Mac, yet I thought
numpy would
by default build against the VecLib optimised BLAS. If I build
explicitly against
ATLAS, I am actually seeing slightly slower results.
But I also saw these kind of warnings on the first timeit runs:
%timeit A.dot(c)
The slowest run took 6.91 times longer than the fastest. This could
mean that an intermediate result is being cached
and when testing much larger arrays, the discrepancy between matmul
and dot rather
increases, so perhaps this is more an issue of a less memory
-efficient implementation
in np.dot?
Sorry, I missed the fact that one of the arrays was 3D. In that case I
am not even sure which if the functions call into blas or what else
they have to do, would have to check. Note that `np.dot` uses a
different type of combinging high dimensional arrays. @/matmul
broadcasts extra axes, while np.dot will do the outer combination of
them, so that the result is:
As = A.shape
As.pop(-1)
cs = c.shape
cs.pop(-2)  # if possible
result_shape = As + cs
which happens to be identical if only A.ndim > 2 and c.ndim <= 2.
Makes sense now; with A.ndim = 2 both operations take about the same time
(and are ~50% faster with VecLib than with ATLAS) and yield identical results,
while any additional dimension in A adds more overhead time to np.dot,
and the results are np.allclose, but not exactly identical.

Thanks,
						Derek