[Numpy-discussion] expensive tensordot

Tue Jun 15 15:42:33 EDT 2010

Paul Northug wrote:
> I have a computation bounded by one step and I have always wondered
> how to make it fast enough to be useable. I suspect that I have to use
> an approximation, but I was hoping someone would spot a major
> inefficiency in my implementation.
> 
> The calculation is a kind of outer product of two sets of time series:
> 
> for n,p in np.ndindex(self.P, self.P):
>      self.A[:,n,:,p] += np.tensordot(a[:,:,n:n+self.T],
>                                                  a[:,:,p:p+self.T],
> ([0,2], [0,2]))
> 
> The sizes for all indices are on the order of 100 with the exception
> that a.size[0] == 10. One inefficiency is that A is symmetric in (n,p)
> but this is only a factor of 2.

The memory access pattern appears to be quite inefficient. It is better 
to vary the last indices fastest, i.e. you should avoid indexing or 
slicing in the last dimension and leave those to tensordot if at all 
possible. Use arr.T.copy() or arr.copy('F') prior/after the loop.

-- 
Dag Sverre