I know this question has been asked before, both on this list as well as
several threads on Stack Overflow, etc. It's a common issue. I'm NOT asking
for how to do this using existing Numpy functions (as that information can
be found in any of those sources)--what I'm asking is whether Numpy would
accept inclusion of a function that does this, or whether (possibly more
likely) such a proposal has already been considered and rejected for some
reason.
The task is this--there's a large array and you want to find the next
element after some index that satisfies some condition. Such elements are
common, and the typical number of elements to be searched through is small
relative to the size of the array. Therefore, it would greatly improve
performance to avoid testing ALL elements against the conditional once one
is found that returns True. However, all built-in functions that I know of
test the entire array.
One can obviously jury-rig some ways, like for instance create a "for" loop
over non-overlapping slices of length slice_length and call something like
np.where(cond) on each--that outer "for" loop is much faster than a loop
over individual elements, and the inner loop at most will go slice_length-1
elements past the first "hit". However, needing to use such a convoluted
piece of code for such a simple task seems to go against the Numpy spirit
of having one operation being one function of the form func(arr)".
A proposed function for this, let's call it "np.first_true(arr, start_idx,
[stop_idx])" would be best implemented at the C code level, possibly in the
same code file that defines np.where. I'm wondering if I, or someone else,
were to write such a function, if the Numpy developers would consider
merging it as a standard part of the codebase. It's possible that the idea
of such a function is bad because it would violate some existing
broadcasting or fancy indexing rules. Clearly one could make it possible to
pass an "axis" argument to np.first_true() that would select an axis to
search over in the case of multi-dimensional arrays, and then the result
would be an array of indices of one fewer dimension than the original
array. So np.first_true(np.array([1,5],[2,7],[9,10],cond) would return
[1,1,0] for cond(x): x>4. The case where no elements satisfy the condition
would need to return a "signal value" like -1. But maybe there are some
weird cases where there isn't a sensible return value, hence why such a
function has not been added.
-Andrew Rosko