[Numpy-discussion] Need help for implementing a fast clip in numpy (was slow clip)

David Cournapeau david at ar.media.kyoto-u.ac.jp
Thu Jan 11 00:03:15 EST 2007

Stefan van der Walt wrote:
> On Wed, Jan 10, 2007 at 08:28:14PM +0100, Francesc Altet wrote:
>> El dt 09 de 01 del 2007 a les 23:19 +0900, en/na David Cournapeau va
>> escriure:
>>> Hi,
>>>     I am finally implementing a C function to replace the current slow 
>>> implementation of clip in python as promised a few weeks ago. The idea 
>>> is to implement it as the following:
>>> def clip(input, min, max):
>>>     a   = input.copy()
>>>     putmask(a, a <= min, min)
>>>     putmask(a, a >= max, max)
>>>     return a
>>> I don't have that much experience in writing general numpy functions, so 
>>> I was wondering of other people could advise me on the following points.
>> Sorry, but not real experience writing extensions directly in C.
>> However, you may want to experiment using numexpr for doing what you
>> want. It's relatively easy to extend numexpr and adding a new opcode to
>> its virtual machine. I'm attaching a patch for implementing such a clip
>> routine (only for floating point types, but given the example, it should
>> be straightforward to add support for integers as well).
>> Also, you should note that using the fancy indexing of numpy seems to
>> perform better than the putmask approach. Below are my figures for a
>> small benchmark (also attached) for testing the performance of clip
>> using several approaches:
>> time (putmask)--> 1.38
>> time (where)--> 2.713
>> time (numexpr where)--> 1.291
>> time (fancy+assign)--> 0.967
>> time (numexpr clip)--> 0.596
>> It is interesting to see there how fancy-indexing + assignation is quite
>> more efficient than putmask.
> Not on my machine:
> time (putmask)--> 0.181
> time (where)--> 0.783
> time (numexpr where)--> 0.26
> time (fancy+assign)--> 0.202
When I started looking at those things, I did the indexing method, and 
someone else proposed putmask, function which I was not aware of at that 
time. Both are similar in speed, and vastly (almost an order of 
magnitude on moderately sized contiguous arrays) faster than the current 
numpy clip.

My current C implementation does not use the equivalent of putmask. I 
try to determine which case are easy (basically, if the datatype is a 
numeric datatype and native endian), handle those directly, and for 
other cases (non native endian, objects, etc...), simply forwarding to 
the original function for now.

The main difficulty is that I am not aware of all the datatypes that 
numpy functions are supposed to handle (for example, when I started, I 
didn't know that numpy could handle non native endian, which makes 
things a bit more complicated in C to support).



More information about the NumPy-Discussion mailing list