Hi Robert,<br><br>

<div class="gmail_quote">On Thu, Feb 10, 2011 at 8:16 PM, Robert Kern <span dir="ltr"><<a href="mailto:robert.kern@gmail.com">robert.kern@gmail.com</a>></span> wrote:<br>

<blockquote style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" class="gmail_quote">

<div class="im">On Thu, Feb 10, 2011 at 11:53, eat <<a href="mailto:e.antero.tammi@gmail.com">e.antero.tammi@gmail.com</a>> wrote:<br>> Thanks Chuck,<br>><br>> for replying. But don't you still feel very odd that dot outperforms sum in<br>

> your machine? Just to get it simply; why sum can't outperform dot? Whatever<br>> architecture (computer, cache) you have, it don't make any sense at all that<br>> when performing significantly less instructions, you'll reach to spend more<br>

> time ;-).<br><br></div>These days, the determining factor is less often instruction count<br>than memory latency, and the optimized BLAS implementations of dot()<br>heavily optimize the memory access patterns.</blockquote>


<div>Can't we have this as well with simple sum?</div>

<blockquote style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" class="gmail_quote">Additionally, the number<br>of instructions in your dot() probably isn't that many more than the<br>sum(). The sum() is pretty dumb</blockquote>


<div>But does it need to be?</div>

<blockquote style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" class="gmail_quote">and just does a linear accumulation<br>using the ufunc reduce mechanism, so (m*n-1) ADDs plus quite a few<br>

instructions for traversing the array in a generic manner. With fused<br>multiply-adds, being able to assume contiguous data and ignore the<br>numpy iterator overhead, and applying divide-and-conquer kernels to<br>arrange sums, the optimized dot() implementations could have a<br>

comparable instruction count.</blockquote>

<div>Couldn't sum benefit with similar logic?</div>

<blockquote style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" class="gmail_quote">If you were willing to spend that amount of developer time and code<br>complexity to make platform-specific backends to sum()</blockquote>


<div>Actually I would, but I'm not competent at all in that detailed level (:, But I'm willing to spend more on my own time for example for testing, debugging, analysing various improvements and suggestions if such emerge.</div>


<blockquote style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" class="gmail_quote">, you could make<br>it go really fast, too. Typically, it's not all that important to make<br>it worthwhile, though. One thing that might be worthwhile is to make<br>

implementations of sum() and cumsum() that avoid the ufunc machinery<br>and do their iterations more quickly, at least for some common<br>combinations of dtype and contiguity.<br></blockquote>

<div>Well I'm allready perplexd before reaching that 'ufunc machinery', it's actually anyway trivial (for us more mortal ;-) to figure out what's happening with sum on fromnumeric.py!</div>

<div> </div>

<div> </div>

<div>Regards,</div>

<div>eat</div>

<blockquote style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" class="gmail_quote"><font color="#888888"><br>--<br>Robert Kern<br><br>"I have come to believe that the whole world is an enigma, a harmless<br>

enigma that is made terrible by our own mad attempt to interpret it as<br>though it had an underlying truth."<br>  -- Umberto Eco<br></font>

<div>

<div></div>

<div class="h5">_______________________________________________<br>NumPy-Discussion mailing list<br><a href="mailto:NumPy-Discussion@scipy.org">NumPy-Discussion@scipy.org</a><br><a href="http://mail.scipy.org/mailman/listinfo/numpy-discussion" target="_blank">http://mail.scipy.org/mailman/listinfo/numpy-discussion</a><br>

</div></div></blockquote></div><br>