[scikit-learn] Nearest neighbor search with 2 distance measures

Rohin Kumar yrohinkumar at gmail.com
Tue Aug 1 13:50:58 EDT 2017


Dear Jake,

Thank you for your prompt reply. I started with KD-tree but after realising
it doesn't support custom metrics (I don't know the reason for this - would
be nice feature) I shifted to BallTree and was looking for a 2 metric based
categorisation. After looking around, the best I could find at most were
brute-force methods written in python (had my own version too) or better
optimised ones in C or FORTRAN. The closest one was halotools which again
works with euclidean metric. For now, I will try to get my work done with 2
different BallTrees iteratively in bins. If I find a better option will try
to post an update.

Regards,
Rohin.


On Tue, Aug 1, 2017 at 10:55 PM, Jacob Vanderplas <jakevdp at cs.washington.edu
> wrote:

> Hi Rohin,
> Ah, I see. I don't think a BallTree is the right data structure for an
> anisotropic N-point query, because it fundamentally assumes spherical
> symmetry of the metric. You may be able to do something like this with a
> specialized KD-tree, but scikit-learn doesn't support this, and I don't
> imagine that it ever will given the very specialized nature of the
> application.
>
> I'm certain someone has written efficient code for this operation in the
> astronomy community, but I don't know of any good Python package to
> recommend for this – I'd suggest googling for keywords and seeing where
> that gets you.
>
> Thanks,
>   Jake
>
>  Jake VanderPlas
>  Senior Data Science Fellow
>  Director of Open Software
>  University of Washington eScience Institute
>
> On Tue, Aug 1, 2017 at 6:15 AM, Rohin Kumar <yrohinkumar at gmail.com> wrote:
>
>> Since you seem to be from Astrophysics/Cosmology background (I am
>> assuming you are jakevdp - the creator of astroML - if you are - I am
>> lucky!), I can explain my application scenario. I am trying to calculate
>> the anisotropic two-point correlation function something like done in
>> rp_pi_tpcf
>> <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.rp_pi_tpcf.html#halotools.mock_observables.rp_pi_tpcf>
>>  or s_mu_tpcf
>> <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.s_mu_tpcf.html#halotools.mock_observables.s_mu_tpcf>
>> using pairs (DD,DR,RR) calculated from BallTree.two_point_correlation
>>
>> In halotools (http://halotools.readthedocs.io/en/latest/function_usage/mo
>> ck_observables_functions.html) it is implemented using rectangular
>> grids. I could calculate 2pcf with custom metrics using one variable with
>> BallTree as done in astroML. I intend to find the anisotropic counter part.
>>
>> Thanks & Regards,
>> Rohin
>>
>>
>> On Tue, Aug 1, 2017 at 5:18 PM, Rohin Kumar <yrohinkumar at gmail.com>
>> wrote:
>>
>>> Dear Jake,
>>>
>>> Thanks for your response. I meant to group/count pairs in boxes (using
>>> two arrays simultaneously-hence needing 2 metrics) instead of one distance
>>> array as the binning parameter. I don't know if the algorithm supports such
>>> a thing. For now, I am proceeding with your suggestion of two ball trees at
>>> huge computational cost. I hope I am able to frame my question properly.
>>>
>>> Thanks & Regards,
>>> Rohin.
>>>
>>>
>>>
>>> On Mon, Jul 31, 2017 at 8:16 PM, Jacob Vanderplas <
>>> jakevdp at cs.washington.edu> wrote:
>>>
>>>> On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar <yrohinkumar at gmail.com>
>>>> wrote:
>>>>
>>>>> *update*
>>>>>
>>>>> May be it doesn't have to be done at the tree creation level. It could
>>>>> be using loops and creating two different balltrees. Something like
>>>>>
>>>>> tree1=BallTree(X,metric='metric1') #for x-z plane
>>>>> tree2=BallTree(X,metric='metric2') #for y-z plane
>>>>>
>>>>> And then calculate correlation functions in a loop to get
>>>>> tpcf(X,r1,r2) using tree1.two_point_correlation(X,r1) and
>>>>> tree2.two_point_correlation(X,r2)
>>>>>
>>>>
>>>> Hi Rohin,
>>>> It's not exactly clear to me what you wish the tree to do with the two
>>>> different metrics, but in any case the ball tree only supports one metric
>>>> at a time. If you can construct your desired result from two ball trees
>>>> each with its own metric, then that's probably the best way to proceed,
>>>>    Jake
>>>>
>>>>
>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170801/61fc6f04/attachment-0001.html>


More information about the scikit-learn mailing list