<div dir="ltr"><div><div>Broadcasting, by itself, should not be creating large arrays in memory. It uses stride tricks to make the array appear larger, while simply reusing the same memory block. This is why it is so valuable because it doesn't make a copy.<br><br></div>Now, what may be happening is that the resulting calculation from the broadcasted arrays is too large to easily fit into the cpu cache, so the subsequent summation might be hitting performance penalties for that. Essentially, your first example may be a poor-man's implementation of data chunking. I bet if you ran these performance metrics over a wide range of sizes, you will see some interesting results.<br><br></div><div>Cheers!<br></div><div>Ben Root<br><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Sep 14, 2014 at 10:53 PM, Ryan Nelson <span dir="ltr"><<a href="mailto:rnelsonchem@gmail.com" target="_blank">rnelsonchem@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I think I figured out my own question. I guess that the broadcasting approach is generating a very large 2D array in memory, which takes a bit of extra time. I gathered this from reading the last example on the following site:<div><a href="http://wiki.scipy.org/EricsBroadcastingDoc" target="_blank">http://wiki.scipy.org/EricsBroadcastingDoc</a><br></div><div>I tried this again with a much smaller "xs" array (~100 points) and the broadcasting version was much faster.</div><div>Thanks</div><div><br></div><div>Ryan</div><div><br></div><div>Note: The link to the Scipy wiki page above is broken at the bottom of Numpy's broadcasting page, otherwise I would have seen that earlier. Sorry for the noise. </div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Sep 14, 2014 at 10:22 PM, Ryan Nelson <span dir="ltr"><<a href="mailto:rnelsonchem@gmail.com" target="_blank">rnelsonchem@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hello all,<div><br></div><div>I have a question about the performance of broadcasting versus Python for loops. I have the following sample code that approximates some simulation I'd like to do:</div><div><br></div><div>## Test Code ##</div><div><p style="margin:0px">import numpy as np</p>

<p style="margin:0px"><br></p>

<p style="margin:0px">def lorentz(x, pos, inten, hwhm):</p>

<p style="margin:0px">    return inten*( hwhm**2 / ( (x - pos)**2 + hwhm**2 ) )</p>

<p style="margin:0px"><br></p>

<p style="margin:0px">poss = np.random.rand(100)</p>

<p style="margin:0px">intens = np.random.rand(100)</p>

<p style="margin:0px">xs = np.linspace(0,10,10000)</p>

<p style="margin:0px"><br></p>

<p style="margin:0px">def first_try():</p>

<p style="margin:0px">    sim_inten = np.zeros(xs.shape)</p>

<p style="margin:0px">    for freq, inten in zip(poss, intens):</p>

<p style="margin:0px">        sim_inten += lorentz(xs, freq, inten, 5.0)</p><p style="margin:0px">    return sim_inten</p>

<p style="margin:0px"><br></p>

<p style="margin:0px">def second_try():</p>

<p style="margin:0px">    sim_inten2 = lorentz(xs.reshape((-1,1)), poss, intens, 5.0)</p>

<p style="margin:0px">    sim_inten2 = sim_inten2.sum(axis=1)</p><p style="margin:0px">    return sim_inten2</p><p style="margin:0px"><br></p><p style="margin:0px">print np.array_equal(first_try(), second_try())<br></p><p style="margin:0px"><br></p><p style="margin:0px">## End Test ##</p><p style="margin:0px"><br></p><p style="margin:0px">Running this script prints "True" for the final equality test. However, IPython's %timeit magic, gives ~10 ms for first_try and ~30 ms for second_try. I tried this on Windows 7 (Anaconda Python) and on a Linux machine both with Python 2.7 and Numpy 1.8.2.</p><p style="margin:0px"><br></p><p style="margin:0px">I understand in principle why broadcasting should be faster than Python loops, but I'm wondering why I'm getting worse results with the pure Numpy function. Is there some general rules for when broadcasting might give worse performance than a Python loop?</p><p style="margin:0px"><br></p><p style="margin:0px">Thanks</p><span><font color="#888888"><p style="margin:0px"><br></p><p style="margin:0px">Ryan</p><p style="margin:0px"><br></p></font></span></div></div>

</blockquote></div><br></div>

</div></div><br>_______________________________________________<br>

NumPy-Discussion mailing list<br>

<a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br>

<a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>

<br></blockquote></div><br></div>