<div dir="ltr"><div><div><div>Right now I am using np.sum(A * B.T, axis=1) for dense data and I have implemented a Cython routine for sparse data.<br></div><div>I haven't benched np.sum(A * B.T, axis=1) vs. np.einsum("ij,ji->i", A, B) yet since I am mostly interested in the sparse case right now.<br></div><div><br></div>When A and B are C-style and Fortran-style, the optimal algorithm should be computing the inner products along the diagonal using BLAS. <br></div>If not, I guess this will need some benchmarking.<br><br></div><div>Another use for this is to compute the row-wise L2 norms: np.diagdot(A, A.T).<br></div><div><br></div><div>Mathieu<br><br></div><div><div><div class="gmail_extra"><div class="gmail_quote">On Fri, May 22, 2015 at 5:58 PM, David Cournapeau <span dir="ltr"><<a href="mailto:cournape@gmail.com" target="_blank">cournape@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="h5">On Fri, May 22, 2015 at 5:39 PM, Mathieu Blondel <span dir="ltr"><<a href="mailto:mathieu@mblondel.org" target="_blank">mathieu@mblondel.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div><div><div>Hi,<br><br></div>I often need to compute the equivalent of<br><br></div>np.diag(np.dot(A, B)).<br><br></div>Computing np.dot(A, B) is highly inefficient if you only need the diagonal entries. Two more efficient ways of computing the same thing are<br><br>np.sum(A * B.T, axis=1)<br><br>and<br><br>np.einsum("ij,ji->i", A, B).<br><br></div>The first can allocate quite a lot of temporary memory.<br></div>The second can be quite cryptic for someone not familiar with einsum.<br><div>I assume that einsum does not compute np.dot(A, B), but I haven't verified.<br></div></div><div><br></div>Since this is is quite a recurrent pattern, I was wondering if it would be worth adding a dedicated function to NumPy and SciPy's sparse module. A possible name would be "diagdot". The best performance would be obtained when A is C-style and B fortran-style.<br></div></div></div></blockquote><div><br></div></div></div><div>Does your implementation use BLAS, or is just a a wrapper around einsum ?</div><span class=""><font color="#888888"><div><br></div><div>David</div></font></span></div><br></div></div>

<br>_______________________________________________<br>

NumPy-Discussion mailing list<br>

<a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br>

<a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>

<br></blockquote></div><br></div></div></div></div>