[Numpy-discussion] slow numpy.clip ?

Tue Dec 19 21:30:45 EST 2006

Travis Oliphant wrote:
> Robert Kern wrote:
>> David Cournapeau wrote:
>>
>>   
>>> Basically, at least from those figures, both versions are pretty 
>>> similar, and not worth improving much anyway for matplotlib. There is 
>>> something funny with numpy version, though.
>>>     
>> Looking at the code, it's certainly not surprising that the current
>> implementation of clip() is slow. It is a direct numpy C API translation of the
>> following (taken from numarray, but it is the same in Numeric):
>>
>>
>> def clip(m, m_min, m_max):
>>     """clip()  returns a new array with every entry in m that is less than m_min
>>     replaced by m_min, and every entry greater than m_max replaced by m_max.
>>     """
>>     selector = ufunc.less(m, m_min)+2*ufunc.greater(m, m_max)
>>     return choose(selector, (m, m_min, m_max))
>>
>>   
>
> There are a lot of functions that are essentially this.   Many things 
> were done to just get something working.  It would seem like a good idea 
> to re-code many of these to speed them up.
>> Creating that integer selector array is probably the most expensive part.
>> Copying the array, then using putmask() or similar is certainly a better
>> approach, and I can see no drawbacks to it.
>>
>> If anyone is up to translating their faster clip() into C, I'm more than happy
>> to check it in. I might also entertain adding a copy=True keyword argument, but
>> I'm not entirely certain we should be expanding the API during the 1.0.x series.
>>
>>   
> The problem with the copy=True keyword is that it would imply needing to 
> expand the C-API for PyArray_Clip and should not be done until 1.1 IMHO.
>
> We would probably be better off not expanding the keyword arguments to 
> methods as well until that time.
When I went back to home, I started taking a close look a numpy/core C 
sources, with the help of the numpy ebook. The huge source files make it 
really difficult for me to follow some things: I was wondering if there 
is some rationale behind it, or if this is just a remain of old 
developments of numpy.

The main problem I have with those huge files is that I am confused 
between the functions parts of the public API, the one for backward 
compatibility, etc... I wanted to extract the PyArray_TakeFom function 
to see where the time is spent, but this is quite difficult, because of 
various dependencies.

My question is then: is there any plan to change this ? If not, is this 
for some reasons I don't see, or is this just because of lack of manpower ?

cheers,

David