[Chicago] Closest Index

JongMan Koo jongman at gmail.com
Tue Jan 8 06:03:50 CET 2013


I guess I'm probably late to the party -- but since your dataset is
relatively modest in size, you can afford to be a little more slow for the
benefit of shorter code. The following function will basically do it:

import numpy as np

def closest_index(a, b):
    idx2 = np.minimum(len(a) - 1, np.searchsorted(a, b))
    idx1 = np.maximum(0, idx2 - 1)
    np.putmask(idx1, np.abs(a[idx1] - b) > np.abs(a[idx2] - b), idx2)

    return idx1

For some explanation.. when you call np.searchsorted(a, b[i]), it will
return an index x where following holds: a[x-1] < b[i] <= a[x]. So after
you get that x, there are only two candidates for b[i]: x-1 or x.

The rest is just vectorizing that using numpy's array manipulation
functions, and some bound checking. The time complexity for this would be
O(m*lgn) which is slightly inferior to O(n+m+mlgm) but numpy functions are
in C and they are blazing fast.

Here's a little more verbose version complete with tests:
https://gist.github.com/4481357



2013/1/5 Oren Livne <livne at uchicago.edu>

> Thanks so much to all of you! I can sort B and sweep through both arrays
> once, as suggest.
> Oren
>
>
> On 1/4/2013 5:20 PM, Rajesh Sankaran wrote:
>
>> What about sorting B ( cost - O(nlog(n)))?  Once sorted, as you advance
>> in B array, your A pointer will only move forward.
>> You will go through each of the two arrays only once (cost O(m+n)).
>>
>> --
>> Raj
>>
>> On Fri, 04 Jan 2013 16:11:25 -0600, Oren Livne <livne at uchicago.edu>
>> wrote:
>>
>>  Dear All,
>>>
>>> I have an sorted array A of size 3.4M of positive integers and an array
>>> B of size 300,000 of positive integers. I would like to output for each x
>>> in B the value in A that is closest to it. This is easy to do for a single
>>> element, but I need an efficient implementation for the entire of B. Any
>>> suggestions?
>>>
>>> Thanks,
>>> Oren
>>>
>>
>
> --
> A person is just about as big as the things that make him angry.
>
> ______________________________**_________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/**mailman/listinfo/chicago<http://mail.python.org/mailman/listinfo/chicago>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20130107/91ea4d43/attachment-0001.html>


More information about the Chicago mailing list