<div dir="ltr">On Tue, Oct 8, 2013 at 4:38 AM, Ke Sun <span dir="ltr"><<a href="mailto:sunk.cs@gmail.com" target="_blank">sunk.cs@gmail.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<div class="im">On Tue, Oct 08, 2013 at 01:49:14AM -0700, Matthew Brett wrote:<br>

> Hi,<br>

><br>

> On Tue, Oct 8, 2013 at 1:06 AM, Ke Sun <<a href="mailto:sunk.cs@gmail.com">sunk.cs@gmail.com</a>> wrote:<br>

> > Dear all,<br>

> ><br>

> > I have written the following function to compute the square distances of a large<br>

> > matrix (each sample a row). It compute row by row and print the overall progress.<br>

> > The progress output is important and I didn't use matrix multiplication.<br>

> ><br>

> > I give as input a 70,000x800 matrix. The output should be a 70,000x70,000<br>

> > matrix. The program runs really slow (16 hours for 1/3 progress). And it eats<br>

> > 36G memory (fortunately I have enough).<br>

><br>

> That is very slow.<br>

><br>

> As a matter of interest - why didn't you use matrix multiplication?<br>

</div>Because it will cost hours and I want to see the progress and<br>

know how far it goes. Another concern is to save memory and<br>

compute one sample at a time.<br></blockquote><div><br></div><div>Another option you may want to consider is to do your calculation in chunks, not one item at a time, e.g.:</div><div><br></div><div><p style="margin:0px">

<font face="courier new, monospace">rows, cols = 70000, 800</font></p>

<p style="margin:0px"><font face="courier new, monospace">data = np.random.rand(rows, cols)</font></p>

<p style="margin:0px"><font face="courier new, monospace">chunks = 100</font></p>

<p style="margin:0px"><font face="courier new, monospace">chunk_len = rows // chunks</font></p>

<p style="margin:0px"><font face="courier new, monospace"><br></font></p>

<p style="margin:0px"><font face="courier new, monospace">out = np.empty((rows, rows))</font></p>

<p style="margin:0px"><font face="courier new, monospace">for j in xrange(0, rows, chunk_len):</font></p>

<p style="margin:0px"><font face="courier new, monospace">    chunk_j = data[j: j + chunk_len]</font></p>

<p style="margin:0px"><font face="courier new, monospace">        for k in xrange(j, rows, chunk_len):</font></p>

<p style="margin:0px"><font face="courier new, monospace">            chunk_k =  data[k: k + chunk_len]</font></p>

<p style="margin:0px"><font face="courier new, monospace">            out[j: j + chunk_len,</font></p>

<p style="margin:0px"><font face="courier new, monospace">                k: k + chunk_len] = np.dot(chunk_j, chunk_k.T)</font></p>

<p style="margin:0px"><font face="courier new, monospace">            if j != k:</font></p>

<p style="margin:0px"><font face="courier new, monospace">                out[k: k + chunk_len,</font></p>

<p style="margin:0px"><font face="courier new, monospace">                    j: j + chunk_len] = out[j: j + chunk_len,</font></p>

<p style="margin:0px"><font face="courier new, monospace">                                            k: k + chunk_len].T</font></p>

<p style="margin:0px"><font face="courier new, monospace">q = np.diag(out)</font></p>

<p style="margin:0px"><font face="courier new, monospace">out *= -2</font></p>

<p style="margin:0px"><font face="courier new, monospace">out += q</font></p>

<p style="margin:0px"><font face="courier new, monospace">out += q[:, np.newaxis]</font></p></div><div> </div><div>This way you can still gauge progress, use mostly the fast, efficient vectorized approach and probably offset the (relatively small amount of) Python looping by not calculating most of the symmetrical items.</div>

<div><br></div><div>Jaime</div><div><br></div></div></div></div>