[Numpy-discussion] einsum performance with many operands

Tue Nov 1 12:03:11 EDT 2011

Hi,

   I was playing around with einsum (which is really cool, by the way!), 
and I noticed something strange (or unexpected at least) 
performance-wise. Here is an example:

Let x and y be two NxM arrays, and W be a NxN array. I want to compute

f = x^{ik} W_i^j y_{jk}

(I hope the notation is clear enough)
What surprises me is that it's much faster to compute the two products 
separately - as in ((xW)y) or (x(Wy)), rather than directly passing the 
three-operands expression to einsum. I did a quick benchmark, with the 
following script

#========================================================
import numpy as np

def f1(x,W,y):
     return np.einsum('ik,ij,jk', x,W,y)

def f2(x,W,y):
     return np.einsum('jk,jk', np.einsum('ik,ij->jk', x, W), y)

n = 30
m = 150

x=np.random.random(size=(n, m))
y=np.random.random(size=(n, m))
W=np.random.random(size=(n, n))

setup = """\
from __main__ import f1,f2,x,y,W
"""

if __name__ == '__main__':
     import timeit
     min_f1_time = min(timeit.repeat(stmt='f1(x,W,y)', setup=setup, 
repeat=10, number=10000))
     min_f2_time = min(timeit.repeat(stmt='f2(x,W,y)', setup=setup, 
repeat=10, number=10000))
     print('f1: {time}'.format(time=min_f1_time))
     print('f2: {time}'.format(time=min_f2_time))
#============================================================

and I get (on a particular trial on my intel 64 bit machine running 
linux and numpy 1.6.1) the following:
f1: 2.86719584465
f2: 0.891730070114

Of course, I know nothing of einsum's internals and there's probably a 
good reason for this. But still, it's quite counterintuitive for me; I 
just thought I'd mention it because a quick search didn't bring up 
anything on this topic, and AFAIK einsum's docs/examples don't cover the 
case with more than two operands.

Anyway: I hope I've been clear enough, and that I didn't bring up some 
already-debated topic.

Thanks in advance for any clarification,

      Eugenio