[Numpy-discussion] odd performance of sum?
Robert Kern
robert.kern at gmail.com
Thu Feb 10 17:08:38 EST 2011
On Thu, Feb 10, 2011 at 15:32, eat <e.antero.tammi at gmail.com> wrote:
> Hi Robert,
>
> On Thu, Feb 10, 2011 at 10:58 PM, Robert Kern <robert.kern at gmail.com> wrote:
>>
>> On Thu, Feb 10, 2011 at 14:29, eat <e.antero.tammi at gmail.com> wrote:
>> > Hi Robert,
>> >
>> > On Thu, Feb 10, 2011 at 8:16 PM, Robert Kern <robert.kern at gmail.com>
>> > wrote:
>> >>
>> >> On Thu, Feb 10, 2011 at 11:53, eat <e.antero.tammi at gmail.com> wrote:
>> >> > Thanks Chuck,
>> >> >
>> >> > for replying. But don't you still feel very odd that dot outperforms
>> >> > sum
>> >> > in
>> >> > your machine? Just to get it simply; why sum can't outperform dot?
>> >> > Whatever
>> >> > architecture (computer, cache) you have, it don't make any sense at
>> >> > all
>> >> > that
>> >> > when performing significantly less instructions, you'll reach to
>> >> > spend
>> >> > more
>> >> > time ;-).
>> >>
>> >> These days, the determining factor is less often instruction count
>> >> than memory latency, and the optimized BLAS implementations of dot()
>> >> heavily optimize the memory access patterns.
>> >
>> > Can't we have this as well with simple sum?
>>
>> It's technically feasible to accomplish, but as I mention later, it
>> entails quite a large cost. Those optimized BLASes represent many
>> man-years of effort
>
> Yes I acknowledge this. But didn't they then ignore them something simpler,
> like sum (but which actually could benefit exactly similiar optimizations).
Let's set aside the fact that the people who optimized the
implementation of dot() (the authors of ATLAS or the MKL or whichever
optimized BLAS library you linked to) are different from those who
implemented sum() (the numpy devs). Let me repeat a reason why one
would put a lot of effort into optimizing dot() but not sum():
"""
>> However, they are frequently worth it
>> because those operations are often bottlenecks in whole applications.
>> sum(), even in its stupidest implementation, rarely is.
"""
I don't know if I'm just not communicating very clearly, or if you
just reply to individual statements before reading the whole email.
>> and cause substantial headaches for people
>> building and installing numpy.
>
> I appreciate this. No doubt at all.
>>
>> However, they are frequently worth it
>> because those operations are often bottlenecks in whole applications.
>> sum(), even in its stupidest implementation, rarely is. In the places
>> where it is a significant bottleneck, an ad hoc implementation in C or
>> Cython or even FORTRAN for just that application is pretty easy to
>> write.
>
> But here I have to disagree; I'll think that at least I (if not even the
> majority of numpy users) don't like (nor I'm be capable/ or have enough
> time/ resources) go to dwell such details.
And you think we have the time and resources to do it for you?
> I'm sorry but I'll have to
> restate that it's quite reasonable to expect that sum outperforms dot in any
> case.
You don't optimize a function just because you are capable of it. You
optimize a function because it is taking up a significant portion of
total runtime in your real application. Anything else is a waste of
time.
> Lets now to start make such movements, which enables sum to outperform
> dot.
Sorry, you don't get to volunteer anyone's time or set anyone's
priorities but your own. There are some sensible things one could do
to optimize sums or even general ufunc reductions, as outlined my
Mark, Pauli and myself, but please stop using the accelerated-BLAS
dot() as your benchmark. It's a completely inappropriate comparison.
>> You can gain speed by specializing to just your use case, e.g.
>> contiguous data, summing down to one number, or summing along one axis
>> of only 2D data, etc. There's usually no reason to try to generalize
>> that implementation to put it back into numpy.
>
> Yes, I would really like to specialize into my case, but 'without going out
> the python realm.'
The Bottleneck project is a good place for such things. It's a nice
middle-ground for somewhat-specialized routines that are still pretty
common but not general enough to be in numpy yet.
http://pypi.python.org/pypi/Bottleneck
If you're not willing to learn how to implement it yourself in Cython,
I'm afraid that you're stuck waiting for someone to do it for you. But
please don't expect anyone to feel obligated to do so.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the NumPy-Discussion
mailing list