How do we define a distance metric's parameter for grid search
Hello, I have posted this question in Stackoverflow and did not get an answer. This seems to be a basic usage question and am therefore sending it here. I have following code snippet that attempts to do a grid search in which one of the grid parameters are the distance metrics to be used for the KNN algorithm. The example below fails if I use "wminkowski", "seuclidean" or "mahalanobis" distances metrics. # Define the parameter values that should be searched k_range = range(1,31) weights = ['uniform' , 'distance'] algos = ['auto', 'ball_tree', 'kd_tree', 'brute'] leaf_sizes = range(10, 60, 10) metrics = ["euclidean", "manhattan", "chebyshev", "minkowski", "mahalanobis"] param_grid = dict(n_neighbors = list(k_range), weights = weights, algorithm = algos, leaf_size = list(leaf_sizes), metric=metrics) param_grid # Instantiate the algorithm knn = KNeighborsClassifier(n_neighbors=10) # Instantiate the grid grid = GridSearchCV(knn, param_grid=param_grid, cv=10, scoring='accuracy', n_jobs=-1) # Fit the models using the grid parameters grid.fit(X,y) I assume this is because I have to set or define the ranges for the various distance parameters (for example p, w for “wminkowski” - WMinkowskiDistance ). The "minkowski" distance may be working because its "p" parameter has the default 2. So my questions are: 1. Can we set the range of parameters for the distance metrics for the grid search and if so how? 2. Can we set the value of a parameters for the distance metrics for the grid search and if so how? Hope the question is clear. TIA
I did something similar where I was using GridSearchCV over different kernel functions for SVM and not all kernel functions use the same parameters. For example, the *degree* parameter is only used by the *poly* kernel. from sklearn import svm from sklearn import cross_validation from sklearn import grid_search params = [{'kernel':['poly'],'degree':[1,2,3],'gamma':[1/p,1,2],'coef0':[-1,0,1]},\ {'kernel':['rbf'],'gamma':[1/p,1,2],'degree':[3],'coef0':[0]},\ {'kernel':['sigmoid'],'gamma':[1/p,1,2],'coef0':[-1,0,1],'degree':[3]}] GSC = grid_search.GridSearchCV(estimator = svm.SVC(), param_grid = params,\ cv = cvrand, n_jobs = -1) This worked in this instance because the svm.SVC() object only passes parameters to the kernel functions as needed: [image: Inline image 1] Hence, even though my list of dicts includes all three parameters for all types of kernels I used, they were selectively ignored. I'm not sure about parameters for the distance metrics for the KNN object, but it's a good bet it works the same way. Andrew <~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD Editor-in-Chief, European Journal of Mathematical Sciences Executive Editor, European Journal of Pure and Applied Mathematics www.andrewhowe.com http://www.linkedin.com/in/ahowe42 https://www.researchgate.net/profile/John_Howe12/ I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~> On Mon, Jun 27, 2016 at 1:27 PM, Hugo Ferreira <hmf@inesctec.pt> wrote:
Hello,
I have posted this question in Stackoverflow and did not get an answer. This seems to be a basic usage question and am therefore sending it here.
I have following code snippet that attempts to do a grid search in which one of the grid parameters are the distance metrics to be used for the KNN algorithm. The example below fails if I use "wminkowski", "seuclidean" or "mahalanobis" distances metrics.
# Define the parameter values that should be searched k_range = range(1,31) weights = ['uniform' , 'distance'] algos = ['auto', 'ball_tree', 'kd_tree', 'brute'] leaf_sizes = range(10, 60, 10) metrics = ["euclidean", "manhattan", "chebyshev", "minkowski", "mahalanobis"]
param_grid = dict(n_neighbors = list(k_range), weights = weights, algorithm = algos, leaf_size = list(leaf_sizes), metric=metrics) param_grid
# Instantiate the algorithm knn = KNeighborsClassifier(n_neighbors=10)
# Instantiate the grid grid = GridSearchCV(knn, param_grid=param_grid, cv=10, scoring='accuracy', n_jobs=-1)
# Fit the models using the grid parameters grid.fit(X,y)
I assume this is because I have to set or define the ranges for the various distance parameters (for example p, w for “wminkowski” - WMinkowskiDistance ). The "minkowski" distance may be working because its "p" parameter has the default 2.
So my questions are:
1. Can we set the range of parameters for the distance metrics for the grid search and if so how? 2. Can we set the value of a parameters for the distance metrics for the grid search and if so how?
Hope the question is clear. TIA _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi Hugo, Andrew's approach -- using a list of dicts to specify multiple parameter grids -- is the correct one. However, Andrew, you don't need to include parameters that will be ignored into your parameter grid. The following will be effectively the same: params = [{'kernel':['poly'],'degree':[1,2,3],'gamma':[1/p,1,2],'coef0':[-1,0,1]}, {'kernel':['rbf'],'gamma':[1/p,1,2]}, {'kernel':['sigmoid'],'gamma':[1/p,1,2],'coef0':[-1,0,1]}] Joel On 27 June 2016 at 20:59, Andrew Howe <ahowe42@gmail.com> wrote:
I did something similar where I was using GridSearchCV over different kernel functions for SVM and not all kernel functions use the same parameters. For example, the *degree* parameter is only used by the *poly* kernel.
from sklearn import svm from sklearn import cross_validation from sklearn import grid_search
params = [{'kernel':['poly'],'degree':[1,2,3],'gamma':[1/p,1,2],'coef0':[-1,0,1]},\ {'kernel':['rbf'],'gamma':[1/p,1,2],'degree':[3],'coef0':[0]},\ {'kernel':['sigmoid'],'gamma':[1/p,1,2],'coef0':[-1,0,1],'degree':[3]}] GSC = grid_search.GridSearchCV(estimator = svm.SVC(), param_grid = params,\ cv = cvrand, n_jobs = -1)
This worked in this instance because the svm.SVC() object only passes parameters to the kernel functions as needed: [image: Inline image 1]
Hence, even though my list of dicts includes all three parameters for all types of kernels I used, they were selectively ignored. I'm not sure about parameters for the distance metrics for the KNN object, but it's a good bet it works the same way.
Andrew
<~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD Editor-in-Chief, European Journal of Mathematical Sciences Executive Editor, European Journal of Pure and Applied Mathematics www.andrewhowe.com http://www.linkedin.com/in/ahowe42 https://www.researchgate.net/profile/John_Howe12/ I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
On Mon, Jun 27, 2016 at 1:27 PM, Hugo Ferreira <hmf@inesctec.pt> wrote:
Hello,
I have posted this question in Stackoverflow and did not get an answer. This seems to be a basic usage question and am therefore sending it here.
I have following code snippet that attempts to do a grid search in which one of the grid parameters are the distance metrics to be used for the KNN algorithm. The example below fails if I use "wminkowski", "seuclidean" or "mahalanobis" distances metrics.
# Define the parameter values that should be searched k_range = range(1,31) weights = ['uniform' , 'distance'] algos = ['auto', 'ball_tree', 'kd_tree', 'brute'] leaf_sizes = range(10, 60, 10) metrics = ["euclidean", "manhattan", "chebyshev", "minkowski", "mahalanobis"]
param_grid = dict(n_neighbors = list(k_range), weights = weights, algorithm = algos, leaf_size = list(leaf_sizes), metric=metrics) param_grid
# Instantiate the algorithm knn = KNeighborsClassifier(n_neighbors=10)
# Instantiate the grid grid = GridSearchCV(knn, param_grid=param_grid, cv=10, scoring='accuracy', n_jobs=-1)
# Fit the models using the grid parameters grid.fit(X,y)
I assume this is because I have to set or define the ranges for the various distance parameters (for example p, w for “wminkowski” - WMinkowskiDistance ). The "minkowski" distance may be working because its "p" parameter has the default 2.
So my questions are:
1. Can we set the range of parameters for the distance metrics for the grid search and if so how? 2. Can we set the value of a parameters for the distance metrics for the grid search and if so how?
Hope the question is clear. TIA _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Yeah I know :-). I did it like that for a specific reason which I no longer remember :-D. But, you know, it was probably a good one...hahaha Andrew <~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD Editor-in-Chief, European Journal of Mathematical Sciences Executive Editor, European Journal of Pure and Applied Mathematics www.andrewhowe.com http://www.linkedin.com/in/ahowe42 https://www.researchgate.net/profile/John_Howe12/ I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~> On Mon, Jun 27, 2016 at 2:37 PM, Joel Nothman <joel.nothman@gmail.com> wrote:
Hi Hugo,
Andrew's approach -- using a list of dicts to specify multiple parameter grids -- is the correct one.
However, Andrew, you don't need to include parameters that will be ignored into your parameter grid. The following will be effectively the same:
params = [{'kernel':['poly'],'degree':[1,2,3],'gamma':[1/p,1,2],'coef0':[-1,0,1]}, {'kernel':['rbf'],'gamma':[1/p,1,2]}, {'kernel':['sigmoid'],'gamma':[1/p,1,2],'coef0':[-1,0,1]}]
Joel
On 27 June 2016 at 20:59, Andrew Howe <ahowe42@gmail.com> wrote:
I did something similar where I was using GridSearchCV over different kernel functions for SVM and not all kernel functions use the same parameters. For example, the *degree* parameter is only used by the *poly* kernel.
from sklearn import svm from sklearn import cross_validation from sklearn import grid_search
params = [{'kernel':['poly'],'degree':[1,2,3],'gamma':[1/p,1,2],'coef0':[-1,0,1]},\ {'kernel':['rbf'],'gamma':[1/p,1,2],'degree':[3],'coef0':[0]},\ {'kernel':['sigmoid'],'gamma':[1/p,1,2],'coef0':[-1,0,1],'degree':[3]}] GSC = grid_search.GridSearchCV(estimator = svm.SVC(), param_grid = params,\ cv = cvrand, n_jobs = -1)
This worked in this instance because the svm.SVC() object only passes parameters to the kernel functions as needed: [image: Inline image 1]
Hence, even though my list of dicts includes all three parameters for all types of kernels I used, they were selectively ignored. I'm not sure about parameters for the distance metrics for the KNN object, but it's a good bet it works the same way.
Andrew
<~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD Editor-in-Chief, European Journal of Mathematical Sciences Executive Editor, European Journal of Pure and Applied Mathematics www.andrewhowe.com http://www.linkedin.com/in/ahowe42 https://www.researchgate.net/profile/John_Howe12/ I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
On Mon, Jun 27, 2016 at 1:27 PM, Hugo Ferreira <hmf@inesctec.pt> wrote:
Hello,
I have posted this question in Stackoverflow and did not get an answer. This seems to be a basic usage question and am therefore sending it here.
I have following code snippet that attempts to do a grid search in which one of the grid parameters are the distance metrics to be used for the KNN algorithm. The example below fails if I use "wminkowski", "seuclidean" or "mahalanobis" distances metrics.
# Define the parameter values that should be searched k_range = range(1,31) weights = ['uniform' , 'distance'] algos = ['auto', 'ball_tree', 'kd_tree', 'brute'] leaf_sizes = range(10, 60, 10) metrics = ["euclidean", "manhattan", "chebyshev", "minkowski", "mahalanobis"]
param_grid = dict(n_neighbors = list(k_range), weights = weights, algorithm = algos, leaf_size = list(leaf_sizes), metric=metrics) param_grid
# Instantiate the algorithm knn = KNeighborsClassifier(n_neighbors=10)
# Instantiate the grid grid = GridSearchCV(knn, param_grid=param_grid, cv=10, scoring='accuracy', n_jobs=-1)
# Fit the models using the grid parameters grid.fit(X,y)
I assume this is because I have to set or define the ranges for the various distance parameters (for example p, w for “wminkowski” - WMinkowskiDistance ). The "minkowski" distance may be working because its "p" parameter has the default 2.
So my questions are:
1. Can we set the range of parameters for the distance metrics for the grid search and if so how? 2. Can we set the value of a parameters for the distance metrics for the grid search and if so how?
Hope the question is clear. TIA _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi Andrew and Joel. I am going to give this a go. Thanks, Hugo On 27-06-2016 12:37, Joel Nothman wrote:
Hi Hugo,
Andrew's approach -- using a list of dicts to specify multiple parameter grids -- is the correct one.
However, Andrew, you don't need to include parameters that will be ignored into your parameter grid. The following will be effectively the same:
params = [{'kernel':['poly'],'degree':[1,2,3],'gamma':[1/p,1,2],'coef0':[-1,0,1]}, {'kernel':['rbf'],'gamma':[1/p,1,2]}, {'kernel':['sigmoid'],'gamma':[1/p,1,2],'coef0':[-1,0,1]}]
Joel
On 27 June 2016 at 20:59, Andrew Howe <ahowe42@gmail.com <mailto:ahowe42@gmail.com>> wrote:
I did something similar where I was using GridSearchCV over different kernel functions for SVM and not all kernel functions use the same parameters. For example, the *degree* parameter is only used by the *poly* kernel.
from sklearn import svm from sklearn import cross_validation from sklearn import grid_search
params = [{'kernel':['poly'],'degree':[1,2,3],'gamma':[1/p,1,2],'coef0':[-1,0,1]},\ {'kernel':['rbf'],'gamma':[1/p,1,2],'degree':[3],'coef0':[0]},\ {'kernel':['sigmoid'],'gamma':[1/p,1,2],'coef0':[-1,0,1],'degree':[3]}] GSC = grid_search.GridSearchCV(estimator = svm.SVC(), param_grid = params,\ cv = cvrand, n_jobs = -1)
This worked in this instance because the svm.SVC() object only passes parameters to the kernel functions as needed: Inline image 1
Hence, even though my list of dicts includes all three parameters for all types of kernels I used, they were selectively ignored. I'm not sure about parameters for the distance metrics for the KNN object, but it's a good bet it works the same way.
Andrew
<~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD Editor-in-Chief, European Journal of Mathematical Sciences Executive Editor, European Journal of Pure and Applied Mathematics www.andrewhowe.com <http://www.andrewhowe.com> http://www.linkedin.com/in/ahowe42 https://www.researchgate.net/profile/John_Howe12/ I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
On Mon, Jun 27, 2016 at 1:27 PM, Hugo Ferreira <hmf@inesctec.pt <mailto:hmf@inesctec.pt>> wrote:
Hello,
I have posted this question in Stackoverflow and did not get an answer. This seems to be a basic usage question and am therefore sending it here.
I have following code snippet that attempts to do a grid search in which one of the grid parameters are the distance metrics to be used for the KNN algorithm. The example below fails if I use "wminkowski", "seuclidean" or "mahalanobis" distances metrics.
# Define the parameter values that should be searched k_range = range(1,31) weights = ['uniform' , 'distance'] algos = ['auto', 'ball_tree', 'kd_tree', 'brute'] leaf_sizes = range(10, 60, 10) metrics = ["euclidean", "manhattan", "chebyshev", "minkowski", "mahalanobis"]
param_grid = dict(n_neighbors = list(k_range), weights = weights, algorithm = algos, leaf_size = list(leaf_sizes), metric=metrics) param_grid
# Instantiate the algorithm knn = KNeighborsClassifier(n_neighbors=10)
# Instantiate the grid grid = GridSearchCV(knn, param_grid=param_grid, cv=10, scoring='accuracy', n_jobs=-1)
# Fit the models using the grid parameters grid.fit(X,y)
I assume this is because I have to set or define the ranges for the various distance parameters (for example p, w for “wminkowski” - WMinkowskiDistance ). The "minkowski" distance may be working because its "p" parameter has the default 2.
So my questions are:
1. Can we set the range of parameters for the distance metrics for the grid search and if so how? 2. Can we set the value of a parameters for the distance metrics for the grid search and if so how?
Hope the question is clear. TIA _______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hello, On 27-06-2016 12:37, Joel Nothman wrote:
Hi Hugo,
Andrew's approach -- using a list of dicts to specify multiple parameter grids -- is the correct one.
However, Andrew, you don't need to include parameters that will be ignored into your parameter grid. The following will be effectively the same:
params = [{'kernel':['poly'],'degree':[1,2,3],'gamma':[1/p,1,2],'coef0':[-1,0,1]}, {'kernel':['rbf'],'gamma':[1/p,1,2]}, {'kernel':['sigmoid'],'gamma':[1/p,1,2],'coef0':[-1,0,1]}]
I tried to do this but am having errors. Seems like I need to use the 'metric_params' parameter but I cannot get it right. Here are some of the attempts I made: {'metric': ['wminkowski'], 'metric_params':[{ 'w': [0.01, 0.1, 1, 10, 100], 'p': [1,2,3,4,5]}], 'n_neighbors': list(k_range), 'weights': weights, 'algorithm': algos, 'leaf_size': list(leaf_sizes) } {'metric': ['wminkowski'], 'metric_params':[{ 'w': 0.01, 'p': 1}], 'n_neighbors': list(k_range), 'weights': weights, 'algorithm': algos, 'leaf_size': list(leaf_sizes) } {'metric': ['wminkowski'], 'metric_params':[dict(w=0.01,p=1)], 'n_neighbors': list(k_range), 'weights': weights, 'algorithm': algos, 'leaf_size': list(leaf_sizes) } The last two give me the following error: Exception ignored in: 'sklearn.neighbors.dist_metrics.get_vec_ptr' ValueError: Buffer has wrong number of dimensions (expected 1, got 0) Can anyone see what I am doing wrong? TIA,
Joel
On 27 June 2016 at 20:59, Andrew Howe <ahowe42@gmail.com <mailto:ahowe42@gmail.com>> wrote:
I did something similar where I was using GridSearchCV over different kernel functions for SVM and not all kernel functions use the same parameters. For example, the *degree* parameter is only used by the *poly* kernel.
from sklearn import svm from sklearn import cross_validation from sklearn import grid_search
params = [{'kernel':['poly'],'degree':[1,2,3],'gamma':[1/p,1,2],'coef0':[-1,0,1]},\ {'kernel':['rbf'],'gamma':[1/p,1,2],'degree':[3],'coef0':[0]},\ {'kernel':['sigmoid'],'gamma':[1/p,1,2],'coef0':[-1,0,1],'degree':[3]}] GSC = grid_search.GridSearchCV(estimator = svm.SVC(), param_grid = params,\ cv = cvrand, n_jobs = -1)
This worked in this instance because the svm.SVC() object only passes parameters to the kernel functions as needed: Inline image 1
Hence, even though my list of dicts includes all three parameters for all types of kernels I used, they were selectively ignored. I'm not sure about parameters for the distance metrics for the KNN object, but it's a good bet it works the same way.
Andrew
<~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD Editor-in-Chief, European Journal of Mathematical Sciences Executive Editor, European Journal of Pure and Applied Mathematics www.andrewhowe.com <http://www.andrewhowe.com> http://www.linkedin.com/in/ahowe42 https://www.researchgate.net/profile/John_Howe12/ I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
On Mon, Jun 27, 2016 at 1:27 PM, Hugo Ferreira <hmf@inesctec.pt <mailto:hmf@inesctec.pt>> wrote:
Hello,
I have posted this question in Stackoverflow and did not get an answer. This seems to be a basic usage question and am therefore sending it here.
I have following code snippet that attempts to do a grid search in which one of the grid parameters are the distance metrics to be used for the KNN algorithm. The example below fails if I use "wminkowski", "seuclidean" or "mahalanobis" distances metrics.
# Define the parameter values that should be searched k_range = range(1,31) weights = ['uniform' , 'distance'] algos = ['auto', 'ball_tree', 'kd_tree', 'brute'] leaf_sizes = range(10, 60, 10) metrics = ["euclidean", "manhattan", "chebyshev", "minkowski", "mahalanobis"]
param_grid = dict(n_neighbors = list(k_range), weights = weights, algorithm = algos, leaf_size = list(leaf_sizes), metric=metrics) param_grid
# Instantiate the algorithm knn = KNeighborsClassifier(n_neighbors=10)
# Instantiate the grid grid = GridSearchCV(knn, param_grid=param_grid, cv=10, scoring='accuracy', n_jobs=-1)
# Fit the models using the grid parameters grid.fit(X,y)
I assume this is because I have to set or define the ranges for the various distance parameters (for example p, w for “wminkowski” - WMinkowskiDistance ). The "minkowski" distance may be working because its "p" parameter has the default 2.
So my questions are:
1. Can we set the range of parameters for the distance metrics for the grid search and if so how? 2. Can we set the value of a parameters for the distance metrics for the grid search and if so how?
Hope the question is clear. TIA _______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
I tried to do this but am having errors. Seems like I need to use the 'metric_params' parameter but I cannot get it right. Here are some of the attempts I made:
{'metric': ['wminkowski'], 'metric_params':[{ 'w': [0.01, 0.1, 1, 10, 100], 'p': [1,2,3,4,5]}], 'n_neighbors': list(k_range), 'weights': weights, 'algorithm': algos, 'leaf_size': list(leaf_sizes) }
{'metric': ['wminkowski'], 'metric_params':[{ 'w': 0.01, 'p': 1}], 'n_neighbors': list(k_range), 'weights': weights, 'algorithm': algos, 'leaf_size': list(leaf_sizes) }
{'metric': ['wminkowski'], 'metric_params':[dict(w=0.01,p=1)], 'n_neighbors': list(k_range), 'weights': weights, 'algorithm': algos, 'leaf_size': list(leaf_sizes) }
The last two give me the following error:
Exception ignored in: 'sklearn.neighbors.dist_metrics.get_vec_ptr' ValueError: Buffer has wrong number of dimensions (expected 1, got 0)
Can anyone see what I am doing wrong?
I can see *something* you're doing wrong. Firstly, your second and third examples produce identical Python objects. But in metric_params, p should be an integer, w should be a 1-dimensional array. In your first example, both p and w will be 1d, and in your second and third, both are scalars. You want something like ... 'metric_params': [{'w': [0.01, 0.1, 1, 10, 100], 'p': 1}] ... except that those values for 'w' seem a bit strange for weights (are you sure you want wminkowski?). You can try multiple 'p' with 'metric_params': [{'w': weights, 'p': 1}, {'w': weights, 'p': 2}, {'w': weights, 'p': 3}, ...]
Hi, On 28-06-2016 12:45, Joel Nothman wrote:
I tried to do this but am having errors. Seems like I need to use the 'metric_params' parameter but I cannot get it right. Here are some of the attempts I made:
{'metric': ['wminkowski'], 'metric_params':[{ 'w': [0.01, 0.1, 1, 10, 100], 'p': [1,2,3,4,5]}], 'n_neighbors': list(k_range), 'weights': weights, 'algorithm': algos, 'leaf_size': list(leaf_sizes) }
{'metric': ['wminkowski'], 'metric_params':[{ 'w': 0.01, 'p': 1}], 'n_neighbors': list(k_range), 'weights': weights, 'algorithm': algos, 'leaf_size': list(leaf_sizes) }
{'metric': ['wminkowski'], 'metric_params':[dict(w=0.01,p=1)], 'n_neighbors': list(k_range), 'weights': weights, 'algorithm': algos, 'leaf_size': list(leaf_sizes) }
The last two give me the following error:
Exception ignored in: 'sklearn.neighbors.dist_metrics.get_vec_ptr' ValueError: Buffer has wrong number of dimensions (expected 1, got 0)
Can anyone see what I am doing wrong?
I can see *something* you're doing wrong. Firstly, your second and third examples produce identical Python objects.
Yeah. Its called desperation :-)
But in metric_params, p should be an integer, w should be a 1-dimensional array. In your first example, both p and w will be 1d, and in your second and third, both are scalars. You want something like ... 'metric_params': [{'w': [0.01, 0.1, 1, 10, 100], 'p': 1}] ... except that those values for 'w' seem a bit strange for weights (are you sure you want wminkowski?).
Just testing the code. I'll need to learn what values are the most appropriate here. Are these the weights to be applied to each feature (number of weights = number of features)? Wonder how I can use this during feature selection.
You can try multiple 'p' with 'metric_params': [{'w': weights, 'p': 1}, {'w': weights, 'p': 2}, {'w': weights, 'p': 3}, ...]
I have used the simplest case and set of parameters as follows (before attempting multiple parameters as you have shown above): param_grid = [ {'metric': ['wminkowski'], 'metric_params':[{'w':[10, 20],'p':1}] } ] and I get the error: File "<string>", line unknown SyntaxError: invalid or missing encoding declaration for '/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/neighbors/ball_tree.cpython-34m.so' Ok, so this may be due to the specific type of tree being used. I then set the parameters to: {'metric': ['wminkowski'], 'metric_params':[{'w':[10.0, 20.0],'p':1}], 'algorithm': algos } where algos is: algos = ['brute'] Which results in the following error: AttributeError: 'list' object has no attribute 'dtype' So it seems we need to use an array explicitly. The following will work. {'metric': ['wminkowski'], 'metric_params':[{'w':np.array([10.0, 20.0]),'p':1}], 'algorithm': algos } Thanks for the help. Hugo
participants (3)
-
Andrew Howe -
Hugo Ferreira -
Joel Nothman