
On 31 Jan 2016, at 9:48 am, Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Sa, 2016-01-30 at 20:27 +0100, Derek Homeier wrote:
On 27 Jan 2016, at 1:10 pm, Sebastian Berg < sebastian@sipsolutions.net> wrote:
On Mi, 2016-01-27 at 11:19 +0000, Nadav Horesh wrote:
Why the dot function/method is slower than @ on python 3.5.1? Tested from the latest 1.11 maintenance branch.
The explanation I think is that you do not have a blas optimization. In which case the fallback mode is probably faster in the @ case (since it has SSE2 optimization by using einsum, while np.dot does not do that).
I am a bit confused now, as A @ c is just short for A.__matmul__(c) or equivalent to np.matmul(A,c), so why would these not use the optimised blas? Also, I am getting almost identical results on my Mac, yet I thought numpy would by default build against the VecLib optimised BLAS. If I build explicitly against ATLAS, I am actually seeing slightly slower results. But I also saw these kind of warnings on the first timeit runs:
%timeit A.dot(c) The slowest run took 6.91 times longer than the fastest. This could mean that an intermediate result is being cached
and when testing much larger arrays, the discrepancy between matmul and dot rather increases, so perhaps this is more an issue of a less memory -efficient implementation in np.dot?
Sorry, I missed the fact that one of the arrays was 3D. In that case I am not even sure which if the functions call into blas or what else they have to do, would have to check. Note that `np.dot` uses a different type of combinging high dimensional arrays. @/matmul broadcasts extra axes, while np.dot will do the outer combination of them, so that the result is:
As = A.shape As.pop(-1) cs = c.shape cs.pop(-2) # if possible result_shape = As + cs
which happens to be identical if only A.ndim > 2 and c.ndim <= 2.
Makes sense now; with A.ndim = 2 both operations take about the same time (and are ~50% faster with VecLib than with ATLAS) and yield identical results, while any additional dimension in A adds more overhead time to np.dot, and the results are np.allclose, but not exactly identical. Thanks, Derek