<br><br><div class="gmail_quote">On Thu, Feb 10, 2011 at 3:08 PM, Robert Kern <span dir="ltr"><<a href="mailto:robert.kern@gmail.com">robert.kern@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

<div><div></div><div class="h5">On Thu, Feb 10, 2011 at 15:32, eat <<a href="mailto:e.antero.tammi@gmail.com">e.antero.tammi@gmail.com</a>> wrote:<br>

> Hi Robert,<br>

><br>

> On Thu, Feb 10, 2011 at 10:58 PM, Robert Kern <<a href="mailto:robert.kern@gmail.com">robert.kern@gmail.com</a>> wrote:<br>

>><br>

>> On Thu, Feb 10, 2011 at 14:29, eat <<a href="mailto:e.antero.tammi@gmail.com">e.antero.tammi@gmail.com</a>> wrote:<br>

>> > Hi Robert,<br>

>> ><br>

>> > On Thu, Feb 10, 2011 at 8:16 PM, Robert Kern <<a href="mailto:robert.kern@gmail.com">robert.kern@gmail.com</a>><br>

>> > wrote:<br>

>> >><br>

>> >> On Thu, Feb 10, 2011 at 11:53, eat <<a href="mailto:e.antero.tammi@gmail.com">e.antero.tammi@gmail.com</a>> wrote:<br>

>> >> > Thanks Chuck,<br>

>> >> ><br>

>> >> > for replying. But don't you still feel very odd that dot outperforms<br>

>> >> > sum<br>

>> >> > in<br>

>> >> > your machine? Just to get it simply; why sum can't outperform dot?<br>

>> >> > Whatever<br>

>> >> > architecture (computer, cache) you have, it don't make any sense at<br>

>> >> > all<br>

>> >> > that<br>

>> >> > when performing significantly less instructions, you'll reach to<br>

>> >> > spend<br>

>> >> > more<br>

>> >> > time ;-).<br>

>> >><br>

>> >> These days, the determining factor is less often instruction count<br>

>> >> than memory latency, and the optimized BLAS implementations of dot()<br>

>> >> heavily optimize the memory access patterns.<br>

>> ><br>

>> > Can't we have this as well with simple sum?<br>

>><br>

>> It's technically feasible to accomplish, but as I mention later, it<br>

>> entails quite a large cost. Those optimized BLASes represent many<br>

>> man-years of effort<br>

><br>

> Yes I acknowledge this. But didn't they then  ignore them something simpler,<br>

> like sum (but which actually could benefit exactly similiar optimizations).<br>

<br>

</div></div>Let's set aside the fact that the people who optimized the<br>

implementation of dot() (the authors of ATLAS or the MKL or whichever<br>

optimized BLAS library you linked to) are different from those who<br>

implemented sum() (the numpy devs). Let me repeat a reason why one<br>

would put a lot of effort into optimizing dot() but not sum():<br>

<div class="im"><br>

"""<br>

>> However, they are frequently worth it<br>

>> because those operations are often bottlenecks in whole applications.<br>

>> sum(), even in its stupidest implementation, rarely is.<br>

"""<br>

<br>

</div>I don't know if I'm just not communicating very clearly, or if you<br>

just reply to individual statements before reading the whole email.<br>

<div class="im"><br>

>> and cause substantial headaches for people<br>

>> building and installing numpy.<br>

><br>

> I appreciate this. No doubt at all.<br>

>><br>

>> However, they are frequently worth it<br>

>> because those operations are often bottlenecks in whole applications.<br>

>> sum(), even in its stupidest implementation, rarely is. In the places<br>

>> where it is a significant bottleneck, an ad hoc implementation in C or<br>

>> Cython or even FORTRAN for just that application is pretty easy to<br>

>> write.<br>

><br>

> But here I have to disagree; I'll think that at least I (if not even the<br>

> majority of numpy users) don't like (nor I'm be capable/ or have enough<br>

> time/ resources) go to dwell such details.<br>

<br>

</div>And you think we have the time and resources to do it for you?<br>

<div class="im"><br>

> I'm sorry but I'll have to<br>

> restate that it's quite reasonable to expect that sum outperforms dot in any<br>

> case.<br>

<br>

</div>You don't optimize a function just because you are capable of it. You<br>

optimize a function because it is taking up a significant portion of<br>

total runtime in your real application. Anything else is a waste of<br>

time.<br>

<div class="im"><br></div></blockquote><div><br>Heh. Reminds me of a passage in General Bradley's  <i><font size="2"><span id="btAsinTitle">A Soldier's Story </span></font></i><font size="2"><span id="btAsinTitle">where he admonished one of his officers in North Africa for taking a hill and suffering casualties, telling him that one didn't take a hill because one could, but because doing so served a purpose in the larger campaign.<br>

<br><snip><br><br>Chuck<br></span></font> </div><br></div>