Hi,<div><br></div><div>I was just looking at the einsum function.</div><div>To me, it's a really elegant and clear way of doing array operations, which is the core of what numpy is about.</div><div>It removes the need to remember a range of functions, some of which I find tricky (e.g. tile).</div>
<div><br></div><div>
Unfortunately the present implementation seems ~ 4-6x slower than dot or tensordot for decent size arrays.</div><div>I suspect it is because the implementation does not use blas/lapack calls.</div><div><br></div><div>cheers, George Nurser.</div>
<div><br></div><div>E.g. (in ipython on Mac OS X 10.6, python 2.7.3, numpy 1.6.2 from macports)</div><div><div>a = np.arange(600000.).reshape(1500,400)</div><div>b = np.arange(240000.).reshape(400,600)</div><div>c = np.arange(600)</div>
<div>d = np.arange(400)</div></div><div><br></div><div><div><br></div></div><div><span class="Apple-style-span" style="font-family:monospace,sans-serif;font-size:14px;line-height:17px;white-space:pre-wrap">%timeit np.einsum('ij,jk', a, b)</span></div>
<div><span class="Apple-style-span" style="font-family:monospace,sans-serif;font-size:14px;line-height:17px;white-space:pre-wrap"><span class="Apple-style-span" style="font-family:monospace;white-space:normal"><pre style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:14px;font:inherit;vertical-align:baseline;font-family:monospace,sans-serif;white-space:pre-wrap">
10 loops, best of 3: 156 ms per loop</pre></span></span></div><div><span class="Apple-style-span" style="font-family:monospace,sans-serif;font-size:14px;line-height:17px;white-space:pre-wrap"><span class="Apple-style-span" style="font-family:arial;font-size:small;line-height:normal;white-space:normal"><div>
<div>%timeit np.dot(a,b)</div><div><span class="Apple-style-span" style="font-family:monospace,sans-serif;font-size:14px;line-height:17px;white-space:pre-wrap">10 loops, best of 3: 27.4 ms per loop</span></div></div><div>
<span class="Apple-style-span" style="font-family:monospace,sans-serif;font-size:14px;line-height:17px;white-space:pre-wrap"><br></span></div></span></span></div><div><span class="Apple-style-span" style="font-family:monospace,sans-serif;font-size:14px;line-height:17px;white-space:pre-wrap">%timeit np.einsum('i,ij,j',d,b,c)</span></div>
<div><span class="Apple-style-span" style="font-family:monospace,sans-serif;font-size:14px;line-height:17px;white-space:pre-wrap"><span class="Apple-style-span" style="font-family:monospace;white-space:normal"><pre style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:14px;font:inherit;vertical-align:baseline;font-family:monospace,sans-serif;white-space:pre-wrap">
1000 loops, best of 3: 709 us per loop</pre></span></span><span class="Apple-style-span" style="font-family:monospace,sans-serif;font-size:14px;line-height:17px;white-space:pre-wrap">%timeit np.dot(d,np.dot(b,c))</span><span class="Apple-style-span" style="font-family:monospace;font-size:14px;line-height:17px"><pre style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:14px;font:inherit;vertical-align:baseline;font-family:monospace,sans-serif;white-space:pre-wrap">
10000 loops, best of 3: 121 us per loop</pre><pre style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:14px;font:inherit;vertical-align:baseline;font-family:monospace,sans-serif;white-space:pre-wrap">
<br></pre><pre style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:14px;font:inherit;vertical-align:baseline;font-family:monospace,sans-serif;white-space:pre-wrap">
or</pre><pre style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:14px;font:inherit;vertical-align:baseline;font-family:monospace,sans-serif;white-space:pre-wrap">
abig = np.arange(4800.).reshape(6,8,100)
bbig = np.arange(1920.).reshape(8,6,40)</pre><pre style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:14px;font:inherit;vertical-align:baseline;font-family:monospace,sans-serif;white-space:pre-wrap">
<br></pre><pre style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:14px;font:inherit;vertical-align:baseline;font-family:monospace,sans-serif;white-space:pre-wrap">
%timeit np.einsum('ijk,jil->kl', abig, bbig)</pre></span><span class="Apple-style-span" style="font-family:monospace;font-size:14px;line-height:17px"><pre style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:14px;font:inherit;vertical-align:baseline;font-family:monospace,sans-serif;white-space:pre-wrap">
1000 loops, best of 3: 425 us per loop</pre><pre style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:14px;font:inherit;vertical-align:baseline;font-family:monospace,sans-serif;white-space:pre-wrap">
%timeit np.tensordot(abig,bbig, axes=([1,0],[0,1]))</pre></span><span class="Apple-style-span" style="font-family:monospace;font-size:14px;line-height:17px"><pre style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:14px;font:inherit;vertical-align:baseline;font-family:monospace,sans-serif;white-space:pre-wrap">
10000 loops, best of 3: 105 us per loop</pre></span><span class="Apple-style-span" style="font-family:monospace;font-size:14px;line-height:17px"><pre style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:14px;font:inherit;vertical-align:baseline;font-family:monospace,sans-serif;white-space:pre-wrap">
<br></pre></span></div>