If we don't have an operation for this in numpy's setops module, it probably should be added.

On Nov 28, 2014 10:21 PM, "Jaime Fernández del Río" <jaime.frio@gmail.com> wrote:

On Fri, Nov 28, 2014 at 5:15 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Nov 28, 2014 at 3:15 AM, Alexander Belopolsky <ndarray@mac.com> wrote:
> I probably miss something very basic, but how given two arrays a and b, can
> I find positions in a where elements of b are located? If a were sorted, I
> could use searchsorted, but I don't want to get valid positions for elements
> that are not in a. In my case, a has unique elements, but in the general
> case I would accept the first match. In other words, I am looking for an
> array analog of list.index() method.

How about this?

def index(haystack, needle):
haystack = np.asarray(haystack)
haystack_sort = np.argsort(haystack)
haystack_sorted = haystack[haystack_sort]
return haystack_sort[np.searchsorted(haystack_sorted, needle)]

(Note that this will return incorrect results if any entries in needle
are missing from haystack entirely. If this is a concern then you need
to do some extra error-checking on the searchsorted return value.)

I like this approach a lot. You can actually skip the creation of the haystack_sorted array using the sorter kwarg:

idx = haystack_sort[np.searchsorted(haystack, needle, sorter=haystack_sort)]

But either using haystack_sorted or not, if any item in the needle is larger than the largest entry in the haystack, the indexing will error out with an index out of bounds. So the whole thing with proper error checking gets kind of messy, something along the lines of:

sorted_idx = np.searchsorted(haystack, needle, sorter=haystack_sort)
mask_idx = sorted_idx < len(haystack)
idx = haystack_sort[sorted_idx[mask_idx]]
mask_in_haystack = haystack[idx] == needle[mask_idx]
mask_idx[mask_idx] &= mask_in_haystack

So using -1 to indicate items in needle not found in haystack, you could do:

ret = np.empty_like(needle, dtype=np.intp)
ret[~mask_idx] = -1
ret[mask_idx] = idx[mask_in_haystack]

In the end, it does get kind of messy, but I am not sure how could it be improved. Perhaps giving searchsorted an option to figure out the exact matches?

Jaime

--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion