[Numpy-discussion] Finding values in an array

Jaime Fernández del Río jaime.frio at gmail.com
Fri Nov 28 22:21:32 EST 2014


On Fri, Nov 28, 2014 at 5:15 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Fri, Nov 28, 2014 at 3:15 AM, Alexander Belopolsky <ndarray at mac.com>
> wrote:
> > I probably miss something very basic, but how given two arrays a and b,
> can
> > I find positions in a where elements of b are located?  If a were
> sorted, I
> > could use searchsorted, but I don't want to get valid positions for
> elements
> > that are not in a.  In my case, a has unique elements, but in the general
> > case I would accept the first match.  In other words, I am looking for an
> > array analog of list.index() method.
>
> How about this?
>
> def index(haystack, needle):
>     haystack = np.asarray(haystack)
>     haystack_sort = np.argsort(haystack)
>     haystack_sorted = haystack[haystack_sort]
>     return haystack_sort[np.searchsorted(haystack_sorted, needle)]
>
> (Note that this will return incorrect results if any entries in needle
> are missing from haystack entirely. If this is a concern then you need
> to do some extra error-checking on the searchsorted return value.)
>

I like this approach a lot. You can actually skip the creation of the
haystack_sorted array using the sorter kwarg:

    idx = haystack_sort[np.searchsorted(haystack, needle,
sorter=haystack_sort)]

But either using haystack_sorted or not, if any item in the needle is
larger than the largest entry in the haystack, the indexing will error out
with an index out of bounds. So the whole thing with proper error checking
gets kind of messy, something along the lines of:

    sorted_idx = np.searchsorted(haystack, needle, sorter=haystack_sort)
    mask_idx = sorted_idx < len(haystack)
    idx = haystack_sort[sorted_idx[mask_idx]]
    mask_in_haystack = haystack[idx] == needle[mask_idx]
    mask_idx[mask_idx] &= mask_in_haystack

So using -1 to indicate items in needle not found in haystack, you could do:

    ret = np.empty_like(needle, dtype=np.intp)
    ret[~mask_idx] = -1
    ret[mask_idx] = idx[mask_in_haystack]

In the end, it does get kind of messy, but I am not sure how could it be
improved. Perhaps giving searchsorted an option to figure out the exact
matches?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20141128/ca437f7a/attachment.html>


More information about the NumPy-Discussion mailing list