Hi,
I was playing around with einsum (which is really cool, by the way!),
and I noticed something strange (or unexpected at least)
performance-wise. Here is an example:
Let x and y be two NxM arrays, and W be a NxN array. I want to compute
f = x^{ik} W_i^j y_{jk}
(I hope the notation is clear enough)
What surprises me is that it's much faster to compute the two products
separately - as in ((xW)y) or (x(Wy)), rather than directly passing the
three-operands expression to einsum. I did a quick benchmark, with the
following script
#========================================================
import numpy as np
def f1(x,W,y):
return np.einsum('ik,ij,jk', x,W,y)
def f2(x,W,y):
return np.einsum('jk,jk', np.einsum('ik,ij->jk', x, W), y)
n = 30
m = 150
x=np.random.random(size=(n, m))
y=np.random.random(size=(n, m))
W=np.random.random(size=(n, n))
setup = """\
from __main__ import f1,f2,x,y,W
"""
if __name__ == '__main__':
import timeit
min_f1_time = min(timeit.repeat(stmt='f1(x,W,y)', setup=setup,
repeat=10, number=10000))
min_f2_time = min(timeit.repeat(stmt='f2(x,W,y)', setup=setup,
repeat=10, number=10000))
print('f1: {time}'.format(time=min_f1_time))
print('f2: {time}'.format(time=min_f2_time))
#============================================================
and I get (on a particular trial on my intel 64 bit machine running
linux and numpy 1.6.1) the following:
f1: 2.86719584465
f2: 0.891730070114
Of course, I know nothing of einsum's internals and there's probably a
good reason for this. But still, it's quite counterintuitive for me; I
just thought I'd mention it because a quick search didn't bring up
anything on this topic, and AFAIK einsum's docs/examples don't cover the
case with more than two operands.
Anyway: I hope I've been clear enough, and that I didn't bring up some
already-debated topic.
Thanks in advance for any clarification,
Eugenio