[Numpy-discussion] slow numpy.clip ?

Eric Firing efiring at hawaii.edu
Mon Dec 18 13:53:40 EST 2006


David,

I think my earlier post got lost in the exchange between you and Stefan, 
so I will reiterate the central point: numpy.clip *is* slow, in that an 
implementation using putmask is substantially faster:

def fastclip(a, vmin, vmax):
	a = a.copy()
	putmask(a, a<=vmin, vmin)
	putmask(a, a>=vmax, vmax)
	return a

Using the equivalent of this in a modification of your benchmark, the 
time using the native clip on *or* your alternative on my machine was 
about 2.3 s, versus 1.5 s for the putmask-based equivalent.  It seems 
that putmask is quite a bit faster than boolean indexing.

Obviously, the function above could be implemented as a method, and a 
copy kwarg could be used to make the copy optional--often one does not 
need a copy.

It is also clear that it should be possible to make a much faster native 
clip function that does everything in one pass with no intermediate 
arrays at all.  Whether this is something numpy devels would want to do, 
and how much effort it would take, are entirely different questions.  I 
looked at the present code in clip (and part of the way through the 
chain of functions it invokes) and was quite baffled.

Eric

David Cournapeau wrote:
> Stefan van der Walt wrote:
>> On Mon, Dec 18, 2006 at 05:45:09PM +0900, David Cournapeau wrote:
>>> Yes, I of course mistyped the < and the copy. But the function is still 
>>> moderately faster on my workstation:
>>>
>>>   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>>>         1    0.003    0.003    3.944    3.944 slowclip.py:10(bench_clip)
>>>         1    0.011    0.011    2.001    2.001 slowclip.py:16(clip1_bench)
>>>        10    1.990    0.199    1.990    0.199 
>>> /home/david/local/lib/python2.4/site-packages/numpy/core/fromnumeric.py:372(clip)
>>>         1    1.682    1.682    1.682    1.682 slowclip.py:19(clip2_bench)
>>>         1    0.258    0.258    0.258    0.258 
>>> slowclip.py:6(generate_data_2d)
>>>         0    0.000             0.000          profile:0(profiler)
>> Did you try swapping the order of execution (i.e. clip1 second)?
> Yes, I tried different orders, etc... and it showed the same pattern. 
> The thing is, this kind of thing is highly CPU dependent in my 
> experience; I don't have the time right now to update numpy.scipy on my 
> laptop, but it happens that profiles results are quite different between 
> my workstation (P4 xeon) and my laptop (pentium m).
> 
> anyway, contrary to what I thought first, the real problem is the copy, 
> so this is where I should investigate in matplotlib case,
> 
> David
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list