[scikit-learn] Nearest neighbor search with 2 distance measures

Rohin Kumar yrohinkumar at gmail.com
Wed Aug 2 23:42:58 EDT 2017


Dear Jake,

Thank you for your inputs. Had a look at cykdtree. Core implementation of
the algorithm is in C/C++ modifying which is currently beyond my skill.
Will try to contact their team if they entertain special requests. I should
be able fork and modify the sklearn algorithm in cython once my current
project is complete. Currently going ahead with brute-force method. For
now, this thread may be considered closed. Thanks once again!

Regards,
Rohin.



On Tue, Aug 1, 2017 at 11:29 PM, Jacob Vanderplas <jakevdp at cs.washington.edu
> wrote:

> On Tue, Aug 1, 2017 at 10:50 AM, Rohin Kumar <yrohinkumar at gmail.com>
> wrote:
>
>> I started with KD-tree but after realising it doesn't support custom
>> metrics (I don't know the reason for this - would be nice feature)
>>
>
> The scikit-learn KD-tree doesn't support custom metrics because it
> utilizes relatively strong assumptions about the form of the metric when
> constructing the tree. The Ball Tree makes fewer assumptions, which is why
> it can support arbitrary metrics. It would in principal be possible to
> create a KD Tree that supports custom *axis-aligned* metrics, but again I
> think that would be too specialized for inclusion in scikit-learn.
>
> One project you might check out is cykdtree: https://pypi.python.
> org/pypi/cykdtree
> I'm not certain whether it supports the queries you need, but I would bet
> the team behind that would be willing to work toward these sorts of
> specialized queries if they don't already exist.
>
>    Jake
>
>
>
>
>> I shifted to BallTree and was looking for a 2 metric based
>> categorisation. After looking around, the best I could find at most were
>> brute-force methods written in python (had my own version too) or better
>> optimised ones in C or FORTRAN. The closest one was halotools which again
>> works with euclidean metric. For now, I will try to get my work done with 2
>> different BallTrees iteratively in bins. If I find a better option will try
>> to post an update.
>>
>> Regards,
>> Rohin.
>>
>>
>> On Tue, Aug 1, 2017 at 10:55 PM, Jacob Vanderplas <
>> jakevdp at cs.washington.edu> wrote:
>>
>>> Hi Rohin,
>>> Ah, I see. I don't think a BallTree is the right data structure for an
>>> anisotropic N-point query, because it fundamentally assumes spherical
>>> symmetry of the metric. You may be able to do something like this with a
>>> specialized KD-tree, but scikit-learn doesn't support this, and I don't
>>> imagine that it ever will given the very specialized nature of the
>>> application.
>>>
>>> I'm certain someone has written efficient code for this operation in the
>>> astronomy community, but I don't know of any good Python package to
>>> recommend for this – I'd suggest googling for keywords and seeing where
>>> that gets you.
>>>
>>> Thanks,
>>>   Jake
>>>
>>>  Jake VanderPlas
>>>  Senior Data Science Fellow
>>>  Director of Open Software
>>>  University of Washington eScience Institute
>>>
>>> On Tue, Aug 1, 2017 at 6:15 AM, Rohin Kumar <yrohinkumar at gmail.com>
>>> wrote:
>>>
>>>> Since you seem to be from Astrophysics/Cosmology background (I am
>>>> assuming you are jakevdp - the creator of astroML - if you are - I am
>>>> lucky!), I can explain my application scenario. I am trying to calculate
>>>> the anisotropic two-point correlation function something like done in
>>>> rp_pi_tpcf
>>>> <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.rp_pi_tpcf.html#halotools.mock_observables.rp_pi_tpcf>
>>>>  or s_mu_tpcf
>>>> <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.s_mu_tpcf.html#halotools.mock_observables.s_mu_tpcf>
>>>> using pairs (DD,DR,RR) calculated from BallTree.two_point_correlation
>>>>
>>>> In halotools (http://halotools.readthedocs.
>>>> io/en/latest/function_usage/mock_observables_functions.html) it is
>>>> implemented using rectangular grids. I could calculate 2pcf with custom
>>>> metrics using one variable with BallTree as done in astroML. I intend to
>>>> find the anisotropic counter part.
>>>>
>>>> Thanks & Regards,
>>>> Rohin
>>>>
>>>>
>>>> On Tue, Aug 1, 2017 at 5:18 PM, Rohin Kumar <yrohinkumar at gmail.com>
>>>> wrote:
>>>>
>>>>> Dear Jake,
>>>>>
>>>>> Thanks for your response. I meant to group/count pairs in boxes (using
>>>>> two arrays simultaneously-hence needing 2 metrics) instead of one distance
>>>>> array as the binning parameter. I don't know if the algorithm supports such
>>>>> a thing. For now, I am proceeding with your suggestion of two ball trees at
>>>>> huge computational cost. I hope I am able to frame my question properly.
>>>>>
>>>>> Thanks & Regards,
>>>>> Rohin.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jul 31, 2017 at 8:16 PM, Jacob Vanderplas <
>>>>> jakevdp at cs.washington.edu> wrote:
>>>>>
>>>>>> On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar <yrohinkumar at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> *update*
>>>>>>>
>>>>>>> May be it doesn't have to be done at the tree creation level. It
>>>>>>> could be using loops and creating two different balltrees. Something like
>>>>>>>
>>>>>>> tree1=BallTree(X,metric='metric1') #for x-z plane
>>>>>>> tree2=BallTree(X,metric='metric2') #for y-z plane
>>>>>>>
>>>>>>> And then calculate correlation functions in a loop to get
>>>>>>> tpcf(X,r1,r2) using tree1.two_point_correlation(X,r1) and
>>>>>>> tree2.two_point_correlation(X,r2)
>>>>>>>
>>>>>>
>>>>>> Hi Rohin,
>>>>>> It's not exactly clear to me what you wish the tree to do with the
>>>>>> two different metrics, but in any case the ball tree only supports one
>>>>>> metric at a time. If you can construct your desired result from two ball
>>>>>> trees each with its own metric, then that's probably the best way to
>>>>>> proceed,
>>>>>>    Jake
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikit-learn mailing list
>>>>>>> scikit-learn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170803/cb0f5db7/attachment-0001.html>


More information about the scikit-learn mailing list