[Numpy-discussion] Finding values in an array

Benjamin Root ben.root at ou.edu
Fri Nov 28 22:26:37 EST 2014


If we don't have an operation for this in numpy's setops module, it
probably should be added.

Ben Root
On Nov 28, 2014 10:21 PM, "Jaime Fernández del Río" <jaime.frio at gmail.com>
wrote:

> On Fri, Nov 28, 2014 at 5:15 PM, Nathaniel Smith <njs at pobox.com> wrote:
>
>> On Fri, Nov 28, 2014 at 3:15 AM, Alexander Belopolsky <ndarray at mac.com>
>> wrote:
>> > I probably miss something very basic, but how given two arrays a and b,
>> can
>> > I find positions in a where elements of b are located?  If a were
>> sorted, I
>> > could use searchsorted, but I don't want to get valid positions for
>> elements
>> > that are not in a.  In my case, a has unique elements, but in the
>> general
>> > case I would accept the first match.  In other words, I am looking for
>> an
>> > array analog of list.index() method.
>>
>> How about this?
>>
>> def index(haystack, needle):
>>     haystack = np.asarray(haystack)
>>     haystack_sort = np.argsort(haystack)
>>     haystack_sorted = haystack[haystack_sort]
>>     return haystack_sort[np.searchsorted(haystack_sorted, needle)]
>>
>> (Note that this will return incorrect results if any entries in needle
>> are missing from haystack entirely. If this is a concern then you need
>> to do some extra error-checking on the searchsorted return value.)
>>
>
> I like this approach a lot. You can actually skip the creation of the
> haystack_sorted array using the sorter kwarg:
>
>     idx = haystack_sort[np.searchsorted(haystack, needle,
> sorter=haystack_sort)]
>
> But either using haystack_sorted or not, if any item in the needle is
> larger than the largest entry in the haystack, the indexing will error out
> with an index out of bounds. So the whole thing with proper error checking
> gets kind of messy, something along the lines of:
>
>     sorted_idx = np.searchsorted(haystack, needle, sorter=haystack_sort)
>     mask_idx = sorted_idx < len(haystack)
>     idx = haystack_sort[sorted_idx[mask_idx]]
>     mask_in_haystack = haystack[idx] == needle[mask_idx]
>     mask_idx[mask_idx] &= mask_in_haystack
>
> So using -1 to indicate items in needle not found in haystack, you could
> do:
>
>     ret = np.empty_like(needle, dtype=np.intp)
>     ret[~mask_idx] = -1
>     ret[mask_idx] = idx[mask_in_haystack]
>
> In the end, it does get kind of messy, but I am not sure how could it be
> improved. Perhaps giving searchsorted an option to figure out the exact
> matches?
>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
> de dominación mundial.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20141128/31fa462a/attachment.html>


More information about the NumPy-Discussion mailing list