[Numpy-discussion] odd performance of sum?

Charles R Harris charlesr.harris at gmail.com
Thu Feb 10 17:32:13 EST 2011


On Thu, Feb 10, 2011 at 3:08 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Thu, Feb 10, 2011 at 15:32, eat <e.antero.tammi at gmail.com> wrote:
> > Hi Robert,
> >
> > On Thu, Feb 10, 2011 at 10:58 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
> >>
> >> On Thu, Feb 10, 2011 at 14:29, eat <e.antero.tammi at gmail.com> wrote:
> >> > Hi Robert,
> >> >
> >> > On Thu, Feb 10, 2011 at 8:16 PM, Robert Kern <robert.kern at gmail.com>
> >> > wrote:
> >> >>
> >> >> On Thu, Feb 10, 2011 at 11:53, eat <e.antero.tammi at gmail.com> wrote:
> >> >> > Thanks Chuck,
> >> >> >
> >> >> > for replying. But don't you still feel very odd that dot
> outperforms
> >> >> > sum
> >> >> > in
> >> >> > your machine? Just to get it simply; why sum can't outperform dot?
> >> >> > Whatever
> >> >> > architecture (computer, cache) you have, it don't make any sense at
> >> >> > all
> >> >> > that
> >> >> > when performing significantly less instructions, you'll reach to
> >> >> > spend
> >> >> > more
> >> >> > time ;-).
> >> >>
> >> >> These days, the determining factor is less often instruction count
> >> >> than memory latency, and the optimized BLAS implementations of dot()
> >> >> heavily optimize the memory access patterns.
> >> >
> >> > Can't we have this as well with simple sum?
> >>
> >> It's technically feasible to accomplish, but as I mention later, it
> >> entails quite a large cost. Those optimized BLASes represent many
> >> man-years of effort
> >
> > Yes I acknowledge this. But didn't they then  ignore them something
> simpler,
> > like sum (but which actually could benefit exactly similiar
> optimizations).
>
> Let's set aside the fact that the people who optimized the
> implementation of dot() (the authors of ATLAS or the MKL or whichever
> optimized BLAS library you linked to) are different from those who
> implemented sum() (the numpy devs). Let me repeat a reason why one
> would put a lot of effort into optimizing dot() but not sum():
>
> """
> >> However, they are frequently worth it
> >> because those operations are often bottlenecks in whole applications.
> >> sum(), even in its stupidest implementation, rarely is.
> """
>
> I don't know if I'm just not communicating very clearly, or if you
> just reply to individual statements before reading the whole email.
>
> >> and cause substantial headaches for people
> >> building and installing numpy.
> >
> > I appreciate this. No doubt at all.
> >>
> >> However, they are frequently worth it
> >> because those operations are often bottlenecks in whole applications.
> >> sum(), even in its stupidest implementation, rarely is. In the places
> >> where it is a significant bottleneck, an ad hoc implementation in C or
> >> Cython or even FORTRAN for just that application is pretty easy to
> >> write.
> >
> > But here I have to disagree; I'll think that at least I (if not even the
> > majority of numpy users) don't like (nor I'm be capable/ or have enough
> > time/ resources) go to dwell such details.
>
> And you think we have the time and resources to do it for you?
>
> > I'm sorry but I'll have to
> > restate that it's quite reasonable to expect that sum outperforms dot in
> any
> > case.
>
> You don't optimize a function just because you are capable of it. You
> optimize a function because it is taking up a significant portion of
> total runtime in your real application. Anything else is a waste of
> time.
>
>
Heh. Reminds me of a passage in General Bradley's  *A Soldier's Story *where
he admonished one of his officers in North Africa for taking a hill and
suffering casualties, telling him that one didn't take a hill because one
could, but because doing so served a purpose in the larger campaign.

<snip>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110210/4f373505/attachment.html>


More information about the NumPy-Discussion mailing list