[Numpy-discussion] Need help for implementing a fast clip in numpy (was slow clip)

Thu Jan 11 10:58:28 EST 2007

David Cournapeau wrote:
> Francesc Altet wrote:
>> A Dimecres 10 Gener 2007 22:49, Stefan van der Walt escrigué:
>>> On Wed, Jan 10, 2007 at 08:28:14PM +0100, Francesc Altet wrote:
>>>> El dt 09 de 01 del 2007 a les 23:19 +0900, en/na David Cournapeau va
>>>>
>>>> escriure:
>>>> time (putmask)--> 1.38
>>>> time (where)--> 2.713
>>>> time (numexpr where)--> 1.291
>>>> time (fancy+assign)--> 0.967
>>>> time (numexpr clip)--> 0.596
>>>>
>>>> It is interesting to see there how fancy-indexing + assignation is 
>>>> quite
>>>> more efficient than putmask.
>>> Not on my machine:
>>>
>>> time (putmask)--> 0.181
>>> time (where)--> 0.783
>>> time (numexpr where)--> 0.26
>>> time (fancy+assign)--> 0.202
>>
>> Yeah, a lot of difference indeed. Just for reference, my results 
>> above were done using a Duron (an Athlon but with only 128 KB of 
>> secondary cache) at 0.9 GHz. Now, using my laptop (Intel 4 @ 2 GHz, 
>> 512 KB of secondary cache), I get:
>>
>> time (putmask)--> 0.244
>> time (where)--> 2.111
>> time (numexpr where)--> 0.427
>> time (fancy+assign)--> 0.316
>> time (numexpr clip)--> 0.184
>>
>> so, on my laptop fancy+assign is way slower than putmask. It should 
>> be noted also that the implementation of clip in numexpr (i.e. in 
>> pure C) is not that much faster than putmask (just a 30%); so perhaps 
>> it is not so necessary to come up with a pure C implementation for 
>> clip (or at least, on Intel P4 machines!).
>>
>> In any case, it is really shocking seeing how differently can perform 
>> the several CPU architectures on this apparently simple problem.
> I am not sure it is such a simple problem: it involves massive branching.
To be more precise, you can do clipping without branching, but then the 
clipping is highly type and machine dependent (using bit mask and other 
tricks). It may worth the trouble for double, float and int, dunno.

David