Re: [scikit-learn] Nearest neighbor search with 2 distance measures
Здравствуйте! Я в данный момент в отпуске, и вернусь 15 августа. -- Умнов Алексей
*update* May be it doesn't have to be done at the tree creation level. It could be using loops and creating two different balltrees. Something like tree1=BallTree(X,metric='metric1') #for x-z plane tree2=BallTree(X,metric='metric2') #for y-z plane And then calculate correlation functions in a loop to get tpcf(X,r1,r2) using tree1.two_point_correlation(X,r1) and tree2.two_point_correlation(X,r2)
On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
*update*
May be it doesn't have to be done at the tree creation level. It could be using loops and creating two different balltrees. Something like
tree1=BallTree(X,metric='metric1') #for x-z plane tree2=BallTree(X,metric='metric2') #for y-z plane
And then calculate correlation functions in a loop to get tpcf(X,r1,r2) using tree1.two_point_correlation(X,r1) and tree2.two_point_ correlation(X,r2)
Hi Rohin, It's not exactly clear to me what you wish the tree to do with the two different metrics, but in any case the ball tree only supports one metric at a time. If you can construct your desired result from two ball trees each with its own metric, then that's probably the best way to proceed, Jake
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Dear Jake, Thanks for your response. I meant to group/count pairs in boxes (using two arrays simultaneously-hence needing 2 metrics) instead of one distance array as the binning parameter. I don't know if the algorithm supports such a thing. For now, I am proceeding with your suggestion of two ball trees at huge computational cost. I hope I am able to frame my question properly. Thanks & Regards, Rohin. On Mon, Jul 31, 2017 at 8:16 PM, Jacob Vanderplas <jakevdp@cs.washington.edu
wrote:
On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
*update*
May be it doesn't have to be done at the tree creation level. It could be using loops and creating two different balltrees. Something like
tree1=BallTree(X,metric='metric1') #for x-z plane tree2=BallTree(X,metric='metric2') #for y-z plane
And then calculate correlation functions in a loop to get tpcf(X,r1,r2) using tree1.two_point_correlation(X,r1) and tree2.two_point_correlation( X,r2)
Hi Rohin, It's not exactly clear to me what you wish the tree to do with the two different metrics, but in any case the ball tree only supports one metric at a time. If you can construct your desired result from two ball trees each with its own metric, then that's probably the best way to proceed, Jake
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Since you seem to be from Astrophysics/Cosmology background (I am assuming you are jakevdp - the creator of astroML - if you are - I am lucky!), I can explain my application scenario. I am trying to calculate the anisotropic two-point correlation function something like done in rp_pi_tpcf <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.rp_...> or s_mu_tpcf <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.s_m...> using pairs (DD,DR,RR) calculated from BallTree.two_point_correlation In halotools ( http://halotools.readthedocs.io/en/latest/function_usage/mock_observables_fu...) it is implemented using rectangular grids. I could calculate 2pcf with custom metrics using one variable with BallTree as done in astroML. I intend to find the anisotropic counter part. Thanks & Regards, Rohin Y.Rohin Kumar, +919818092877. On Tue, Aug 1, 2017 at 5:18 PM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
Dear Jake,
Thanks for your response. I meant to group/count pairs in boxes (using two arrays simultaneously-hence needing 2 metrics) instead of one distance array as the binning parameter. I don't know if the algorithm supports such a thing. For now, I am proceeding with your suggestion of two ball trees at huge computational cost. I hope I am able to frame my question properly.
Thanks & Regards, Rohin.
On Mon, Jul 31, 2017 at 8:16 PM, Jacob Vanderplas < jakevdp@cs.washington.edu> wrote:
On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
*update*
May be it doesn't have to be done at the tree creation level. It could be using loops and creating two different balltrees. Something like
tree1=BallTree(X,metric='metric1') #for x-z plane tree2=BallTree(X,metric='metric2') #for y-z plane
And then calculate correlation functions in a loop to get tpcf(X,r1,r2) using tree1.two_point_correlation(X,r1) and tree2.two_point_correlation(X,r2)
Hi Rohin, It's not exactly clear to me what you wish the tree to do with the two different metrics, but in any case the ball tree only supports one metric at a time. If you can construct your desired result from two ball trees each with its own metric, then that's probably the best way to proceed, Jake
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi Rohin, Ah, I see. I don't think a BallTree is the right data structure for an anisotropic N-point query, because it fundamentally assumes spherical symmetry of the metric. You may be able to do something like this with a specialized KD-tree, but scikit-learn doesn't support this, and I don't imagine that it ever will given the very specialized nature of the application. I'm certain someone has written efficient code for this operation in the astronomy community, but I don't know of any good Python package to recommend for this – I'd suggest googling for keywords and seeing where that gets you. Thanks, Jake Jake VanderPlas Senior Data Science Fellow Director of Open Software University of Washington eScience Institute On Tue, Aug 1, 2017 at 6:15 AM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
Since you seem to be from Astrophysics/Cosmology background (I am assuming you are jakevdp - the creator of astroML - if you are - I am lucky!), I can explain my application scenario. I am trying to calculate the anisotropic two-point correlation function something like done in rp_pi_tpcf <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.rp_...> or s_mu_tpcf <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.s_m...> using pairs (DD,DR,RR) calculated from BallTree.two_point_correlation
In halotools (http://halotools.readthedocs.io/en/latest/function_usage/ mock_observables_functions.html) it is implemented using rectangular grids. I could calculate 2pcf with custom metrics using one variable with BallTree as done in astroML. I intend to find the anisotropic counter part.
Thanks & Regards, Rohin
Y.Rohin Kumar, +919818092877 <+91%2098180%2092877>.
On Tue, Aug 1, 2017 at 5:18 PM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
Dear Jake,
Thanks for your response. I meant to group/count pairs in boxes (using two arrays simultaneously-hence needing 2 metrics) instead of one distance array as the binning parameter. I don't know if the algorithm supports such a thing. For now, I am proceeding with your suggestion of two ball trees at huge computational cost. I hope I am able to frame my question properly.
Thanks & Regards, Rohin.
On Mon, Jul 31, 2017 at 8:16 PM, Jacob Vanderplas < jakevdp@cs.washington.edu> wrote:
On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
*update*
May be it doesn't have to be done at the tree creation level. It could be using loops and creating two different balltrees. Something like
tree1=BallTree(X,metric='metric1') #for x-z plane tree2=BallTree(X,metric='metric2') #for y-z plane
And then calculate correlation functions in a loop to get tpcf(X,r1,r2) using tree1.two_point_correlation(X,r1) and tree2.two_point_correlation(X,r2)
Hi Rohin, It's not exactly clear to me what you wish the tree to do with the two different metrics, but in any case the ball tree only supports one metric at a time. If you can construct your desired result from two ball trees each with its own metric, then that's probably the best way to proceed, Jake
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Dear Jake, Thank you for your prompt reply. I started with KD-tree but after realising it doesn't support custom metrics (I don't know the reason for this - would be nice feature) I shifted to BallTree and was looking for a 2 metric based categorisation. After looking around, the best I could find at most were brute-force methods written in python (had my own version too) or better optimised ones in C or FORTRAN. The closest one was halotools which again works with euclidean metric. For now, I will try to get my work done with 2 different BallTrees iteratively in bins. If I find a better option will try to post an update. Regards, Rohin. On Tue, Aug 1, 2017 at 10:55 PM, Jacob Vanderplas <jakevdp@cs.washington.edu
wrote:
Hi Rohin, Ah, I see. I don't think a BallTree is the right data structure for an anisotropic N-point query, because it fundamentally assumes spherical symmetry of the metric. You may be able to do something like this with a specialized KD-tree, but scikit-learn doesn't support this, and I don't imagine that it ever will given the very specialized nature of the application.
I'm certain someone has written efficient code for this operation in the astronomy community, but I don't know of any good Python package to recommend for this – I'd suggest googling for keywords and seeing where that gets you.
Thanks, Jake
Jake VanderPlas Senior Data Science Fellow Director of Open Software University of Washington eScience Institute
On Tue, Aug 1, 2017 at 6:15 AM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
Since you seem to be from Astrophysics/Cosmology background (I am assuming you are jakevdp - the creator of astroML - if you are - I am lucky!), I can explain my application scenario. I am trying to calculate the anisotropic two-point correlation function something like done in rp_pi_tpcf <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.rp_...> or s_mu_tpcf <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.s_m...> using pairs (DD,DR,RR) calculated from BallTree.two_point_correlation
In halotools (http://halotools.readthedocs.io/en/latest/function_usage/mo ck_observables_functions.html) it is implemented using rectangular grids. I could calculate 2pcf with custom metrics using one variable with BallTree as done in astroML. I intend to find the anisotropic counter part.
Thanks & Regards, Rohin
On Tue, Aug 1, 2017 at 5:18 PM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
Dear Jake,
Thanks for your response. I meant to group/count pairs in boxes (using two arrays simultaneously-hence needing 2 metrics) instead of one distance array as the binning parameter. I don't know if the algorithm supports such a thing. For now, I am proceeding with your suggestion of two ball trees at huge computational cost. I hope I am able to frame my question properly.
Thanks & Regards, Rohin.
On Mon, Jul 31, 2017 at 8:16 PM, Jacob Vanderplas < jakevdp@cs.washington.edu> wrote:
On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
*update*
May be it doesn't have to be done at the tree creation level. It could be using loops and creating two different balltrees. Something like
tree1=BallTree(X,metric='metric1') #for x-z plane tree2=BallTree(X,metric='metric2') #for y-z plane
And then calculate correlation functions in a loop to get tpcf(X,r1,r2) using tree1.two_point_correlation(X,r1) and tree2.two_point_correlation(X,r2)
Hi Rohin, It's not exactly clear to me what you wish the tree to do with the two different metrics, but in any case the ball tree only supports one metric at a time. If you can construct your desired result from two ball trees each with its own metric, then that's probably the best way to proceed, Jake
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
On Tue, Aug 1, 2017 at 10:50 AM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
I started with KD-tree but after realising it doesn't support custom metrics (I don't know the reason for this - would be nice feature)
The scikit-learn KD-tree doesn't support custom metrics because it utilizes relatively strong assumptions about the form of the metric when constructing the tree. The Ball Tree makes fewer assumptions, which is why it can support arbitrary metrics. It would in principal be possible to create a KD Tree that supports custom *axis-aligned* metrics, but again I think that would be too specialized for inclusion in scikit-learn. One project you might check out is cykdtree: https://pypi.python.org/pypi/cykdtree I'm not certain whether it supports the queries you need, but I would bet the team behind that would be willing to work toward these sorts of specialized queries if they don't already exist. Jake
I shifted to BallTree and was looking for a 2 metric based categorisation. After looking around, the best I could find at most were brute-force methods written in python (had my own version too) or better optimised ones in C or FORTRAN. The closest one was halotools which again works with euclidean metric. For now, I will try to get my work done with 2 different BallTrees iteratively in bins. If I find a better option will try to post an update.
Regards, Rohin.
On Tue, Aug 1, 2017 at 10:55 PM, Jacob Vanderplas < jakevdp@cs.washington.edu> wrote:
Hi Rohin, Ah, I see. I don't think a BallTree is the right data structure for an anisotropic N-point query, because it fundamentally assumes spherical symmetry of the metric. You may be able to do something like this with a specialized KD-tree, but scikit-learn doesn't support this, and I don't imagine that it ever will given the very specialized nature of the application.
I'm certain someone has written efficient code for this operation in the astronomy community, but I don't know of any good Python package to recommend for this – I'd suggest googling for keywords and seeing where that gets you.
Thanks, Jake
Jake VanderPlas Senior Data Science Fellow Director of Open Software University of Washington eScience Institute
On Tue, Aug 1, 2017 at 6:15 AM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
Since you seem to be from Astrophysics/Cosmology background (I am assuming you are jakevdp - the creator of astroML - if you are - I am lucky!), I can explain my application scenario. I am trying to calculate the anisotropic two-point correlation function something like done in rp_pi_tpcf <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.rp_...> or s_mu_tpcf <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.s_m...> using pairs (DD,DR,RR) calculated from BallTree.two_point_correlation
In halotools (http://halotools.readthedocs. io/en/latest/function_usage/mock_observables_functions.html) it is implemented using rectangular grids. I could calculate 2pcf with custom metrics using one variable with BallTree as done in astroML. I intend to find the anisotropic counter part.
Thanks & Regards, Rohin
On Tue, Aug 1, 2017 at 5:18 PM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
Dear Jake,
Thanks for your response. I meant to group/count pairs in boxes (using two arrays simultaneously-hence needing 2 metrics) instead of one distance array as the binning parameter. I don't know if the algorithm supports such a thing. For now, I am proceeding with your suggestion of two ball trees at huge computational cost. I hope I am able to frame my question properly.
Thanks & Regards, Rohin.
On Mon, Jul 31, 2017 at 8:16 PM, Jacob Vanderplas < jakevdp@cs.washington.edu> wrote:
On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
*update*
May be it doesn't have to be done at the tree creation level. It could be using loops and creating two different balltrees. Something like
tree1=BallTree(X,metric='metric1') #for x-z plane tree2=BallTree(X,metric='metric2') #for y-z plane
And then calculate correlation functions in a loop to get tpcf(X,r1,r2) using tree1.two_point_correlation(X,r1) and tree2.two_point_correlation(X,r2)
Hi Rohin, It's not exactly clear to me what you wish the tree to do with the two different metrics, but in any case the ball tree only supports one metric at a time. If you can construct your desired result from two ball trees each with its own metric, then that's probably the best way to proceed, Jake
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Dear Jake, Thank you for your inputs. Had a look at cykdtree. Core implementation of the algorithm is in C/C++ modifying which is currently beyond my skill. Will try to contact their team if they entertain special requests. I should be able fork and modify the sklearn algorithm in cython once my current project is complete. Currently going ahead with brute-force method. For now, this thread may be considered closed. Thanks once again! Regards, Rohin. On Tue, Aug 1, 2017 at 11:29 PM, Jacob Vanderplas <jakevdp@cs.washington.edu
wrote:
On Tue, Aug 1, 2017 at 10:50 AM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
I started with KD-tree but after realising it doesn't support custom metrics (I don't know the reason for this - would be nice feature)
The scikit-learn KD-tree doesn't support custom metrics because it utilizes relatively strong assumptions about the form of the metric when constructing the tree. The Ball Tree makes fewer assumptions, which is why it can support arbitrary metrics. It would in principal be possible to create a KD Tree that supports custom *axis-aligned* metrics, but again I think that would be too specialized for inclusion in scikit-learn.
One project you might check out is cykdtree: https://pypi.python. org/pypi/cykdtree I'm not certain whether it supports the queries you need, but I would bet the team behind that would be willing to work toward these sorts of specialized queries if they don't already exist.
Jake
I shifted to BallTree and was looking for a 2 metric based categorisation. After looking around, the best I could find at most were brute-force methods written in python (had my own version too) or better optimised ones in C or FORTRAN. The closest one was halotools which again works with euclidean metric. For now, I will try to get my work done with 2 different BallTrees iteratively in bins. If I find a better option will try to post an update.
Regards, Rohin.
On Tue, Aug 1, 2017 at 10:55 PM, Jacob Vanderplas < jakevdp@cs.washington.edu> wrote:
Hi Rohin, Ah, I see. I don't think a BallTree is the right data structure for an anisotropic N-point query, because it fundamentally assumes spherical symmetry of the metric. You may be able to do something like this with a specialized KD-tree, but scikit-learn doesn't support this, and I don't imagine that it ever will given the very specialized nature of the application.
I'm certain someone has written efficient code for this operation in the astronomy community, but I don't know of any good Python package to recommend for this – I'd suggest googling for keywords and seeing where that gets you.
Thanks, Jake
Jake VanderPlas Senior Data Science Fellow Director of Open Software University of Washington eScience Institute
On Tue, Aug 1, 2017 at 6:15 AM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
Since you seem to be from Astrophysics/Cosmology background (I am assuming you are jakevdp - the creator of astroML - if you are - I am lucky!), I can explain my application scenario. I am trying to calculate the anisotropic two-point correlation function something like done in rp_pi_tpcf <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.rp_...> or s_mu_tpcf <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.s_m...> using pairs (DD,DR,RR) calculated from BallTree.two_point_correlation
In halotools (http://halotools.readthedocs. io/en/latest/function_usage/mock_observables_functions.html) it is implemented using rectangular grids. I could calculate 2pcf with custom metrics using one variable with BallTree as done in astroML. I intend to find the anisotropic counter part.
Thanks & Regards, Rohin
On Tue, Aug 1, 2017 at 5:18 PM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
Dear Jake,
Thanks for your response. I meant to group/count pairs in boxes (using two arrays simultaneously-hence needing 2 metrics) instead of one distance array as the binning parameter. I don't know if the algorithm supports such a thing. For now, I am proceeding with your suggestion of two ball trees at huge computational cost. I hope I am able to frame my question properly.
Thanks & Regards, Rohin.
On Mon, Jul 31, 2017 at 8:16 PM, Jacob Vanderplas < jakevdp@cs.washington.edu> wrote:
On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar <yrohinkumar@gmail.com> wrote:
> *update* > > May be it doesn't have to be done at the tree creation level. It > could be using loops and creating two different balltrees. Something like > > tree1=BallTree(X,metric='metric1') #for x-z plane > tree2=BallTree(X,metric='metric2') #for y-z plane > > And then calculate correlation functions in a loop to get > tpcf(X,r1,r2) using tree1.two_point_correlation(X,r1) and > tree2.two_point_correlation(X,r2) >
Hi Rohin, It's not exactly clear to me what you wish the tree to do with the two different metrics, but in any case the ball tree only supports one metric at a time. If you can construct your desired result from two ball trees each with its own metric, then that's probably the best way to proceed, Jake
> > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (3)
-
"Умнов Алексей (Alexey Umnov)" -
Jacob Vanderplas -
Rohin Kumar