einsum slow vs (tensor)dot

Hi, I was just looking at the einsum function. To me, it's a really elegant and clear way of doing array operations, which is the core of what numpy is about. It removes the need to remember a range of functions, some of which I find tricky (e.g. tile). Unfortunately the present implementation seems ~ 4-6x slower than dot or tensordot for decent size arrays. I suspect it is because the implementation does not use blas/lapack calls. cheers, George Nurser. E.g. (in ipython on Mac OS X 10.6, python 2.7.3, numpy 1.6.2 from macports) a = np.arange(600000.).reshape(1500,400) b = np.arange(240000.).reshape(400,600) c = np.arange(600) d = np.arange(400) %timeit np.einsum('ij,jk', a, b) 10 loops, best of 3: 156 ms per loop %timeit np.dot(a,b) 10 loops, best of 3: 27.4 ms per loop %timeit np.einsum('i,ij,j',d,b,c) 1000 loops, best of 3: 709 us per loop %timeit np.dot(d,np.dot(b,c)) 10000 loops, best of 3: 121 us per loop or abig = np.arange(4800.).reshape(6,8,100) bbig = np.arange(1920.).reshape(8,6,40) %timeit np.einsum('ijk,jil->kl', abig, bbig) 1000 loops, best of 3: 425 us per loop %timeit np.tensordot(abig,bbig, axes=([1,0],[0,1])) 10000 loops, best of 3: 105 us per loop

On Wed, Oct 24, 2012 at 7:18 AM, George Nurser <gnurser@gmail.com> wrote:
Hi,
I was just looking at the einsum function. To me, it's a really elegant and clear way of doing array operations, which is the core of what numpy is about. It removes the need to remember a range of functions, some of which I find tricky (e.g. tile).
Unfortunately the present implementation seems ~ 4-6x slower than dot or tensordot for decent size arrays. I suspect it is because the implementation does not use blas/lapack calls.
cheers, George Nurser.
Hi George, IIRC (and I haven't dug into it heavily; not a physicist so I don't encounter this notation often), einsum implements a superset of what dot or tensordot (and the corresponding BLAS calls) can do. So, I think that logic is needed to carve out the special cases in which an einsum can be performed quickly with BLAS. Pull requests in this vein would certainly be welcome, but requires the attention of someone who really understands how einsum works/can work. David

On 25 October 2012 22:54, David Warde-Farley <wardefar@iro.umontreal.ca>wrote:
On Wed, Oct 24, 2012 at 7:18 AM, George Nurser <gnurser@gmail.com> wrote:
Hi,
I was just looking at the einsum function. To me, it's a really elegant and clear way of doing array operations, which is the core of what numpy is about. It removes the need to remember a range of functions, some of which I find tricky (e.g. tile).
Unfortunately the present implementation seems ~ 4-6x slower than dot or tensordot for decent size arrays. I suspect it is because the implementation does not use blas/lapack calls.
cheers, George Nurser.
Hi George,
IIRC (and I haven't dug into it heavily; not a physicist so I don't encounter this notation often), einsum implements a superset of what dot or tensordot (and the corresponding BLAS calls) can do. So, I think that logic is needed to carve out the special cases in which an einsum can be performed quickly with BLAS.
Hi David, Yes, that's my reading of the situation as well.
Pull requests in this vein would certainly be welcome, but requires the attention of someone who really understands how einsum works/can work.
...and I guess how to interface w BLAS/LAPACK. cheers, George.
David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (2)
-
David Warde-Farley
-
George Nurser