[Numpy-discussion] slow numpy.clip ?

Tue Dec 19 02:19:21 EST 2006

David Cournapeau wrote:
> Eric Firing wrote:
>> David,
>>
>> I think my earlier post got lost in the exchange between you and Stefan, 
>> so I will reiterate the central point: numpy.clip *is* slow, in that an 
>> implementation using putmask is substantially faster:
>>
>> def fastclip(a, vmin, vmax):
>> 	a = a.copy()
>> 	putmask(a, a<=vmin, vmin)
>> 	putmask(a, a>=vmax, vmax)
>> 	return a
>>
>> Using the equivalent of this in a modification of your benchmark, the 
>> time using the native clip on *or* your alternative on my machine was 
>> about 2.3 s, versus 1.5 s for the putmask-based equivalent.  It seems 
>> that putmask is quite a bit faster than boolean indexing.
>>
>> Obviously, the function above could be implemented as a method, and a 
>> copy kwarg could be used to make the copy optional--often one does not 
>> need a copy.
>>
>> It is also clear that it should be possible to make a much faster native 
>> clip function that does everything in one pass with no intermediate 
>> arrays at all.  Whether this is something numpy devels would want to do, 
>> and how much effort it would take, are entirely different questions.  I 
>> looked at the present code in clip (and part of the way through the 
>> chain of functions it invokes) and was quite baffled.
> Well, this is something I would be willing to try *if* this is the main 
> bottleneck of imshow/show. I am still unsure about the problem, because 
> if I change numpy.clip to my function, including a copy, I really get a 
> big difference myself:
> 
> val = ma.array(nx.clip(val.filled(vmax), vmin, vmax),
>                                 mask=mask)
> 
> vs
> 
> def myclip(b, m, M):
>     a       = b.copy()
>     a[a<m]  = m
>     a[a>M]  = M
>     return a
> val = ma.array(myclip(val.filled(vmax), vmin, vmax), mask=mask)
> 
> By trying the best result, I get 0.888 ms vs 0.784 for a show() call, 
> which is already a 10 % improvement, and I get almost a 15 % if I remove 
> the copy. I am updating numpy/scipy/mpl on my laptop to see if this is 
> specific to the CPU of my workstation (big cache, high frequency clock, 
> bi CPU with HT enabled).

Please try the putmask version without the copy on your machines; I 
expect it will be quite a bit faster on both machines.  The relative 
speeds of the versions may differ widely depending on how many values 
actually get changed, though.

Eric