[Numpy-discussion] New functions.

Neal Becker ndbecker2 at gmail.com
Wed Jun 1 11:59:52 EDT 2011


Short-circuiting find would be nice.  Right now, to 'find' something you first 
make a bool array, then iterate over it.  If all you want is the first index 
where x[i] = e, not very efficient.

What I just described is a find with a '==' predicate.  Not sure if it's 
worthwhile to consider other predicates.

Maybe call it 'find_first'

Mark Miller wrote:

> I'd love to see something like a "count_unique" function included. The
> numpy.unique function is handy, but it can be a little awkward to
> efficiently go back and get counts of each unique value after the
> fact.
> 
> -Mark
> 
> 
> 
> On Wed, Jun 1, 2011 at 8:17 AM, Keith Goodman <kwgoodman at gmail.com> wrote:
>> On Tue, May 31, 2011 at 8:41 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>>> On Tue, May 31, 2011 at 8:50 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>
>>>> How about including all or some of Keith's Bottleneck package?
>>>> He has tried to include some of the discussed functions and tried to
>>>> make them very fast.
>>>
>>> I don't think they are sufficiently general as they are limited to 2
>>> dimensions. However, I think the moving filters should go into scipy, either
>>> in ndimage or maybe signals. Some of the others we can still speed of
>>> significantly, for instance nanmedian, by using the new functionality in
>>> numpy, i.e., numpy sort has worked with nans for a while now. It looks like
>>> call overhead dominates the nanmax times for small arrays and this might
>>> improve if the ufunc machinery is cleaned up a bit more, I don't know how
>>> far Mark got with that.
>>
>> Currently Bottleneck accelerates 1d, 2d, and 3d input. Anything else
>> falls back to a slower, non-cython version of the function. The same
>> goes for int32, int64, float32, float64.
>>
>> It should not be difficult to extend to higher nd and more dtypes
>> since everything is generated from template. The problem is that there
>> would be a LOT of cython auto-generated C code since there is a
>> separate function for each ndim, dtype, axis combination.
>>
>> Each of the ndim, dtype, axis functions currently has its own copy of
>> the algorithm (such as median). Pulling that out and reusing it should
>> save a lot of trees by reducing the auto-generated C code size.
>>
>> I recently added a partsort and argpartsort.
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>





More information about the NumPy-Discussion mailing list