[scikit-learn] How do we define a distance metric's parameter for grid search

Hugo Ferreira hmf at inesctec.pt
Mon Jun 27 06:27:22 EDT 2016


Hello,

I have posted this question in Stackoverflow and did not get an answer. 
This seems to be a basic usage question and am therefore sending it here.

I have following code snippet that attempts to do a grid search in which 
one of the grid parameters are the distance metrics to be used for the 
KNN algorithm. The example below fails if I use "wminkowski", 
"seuclidean" or "mahalanobis" distances metrics.

# Define the parameter values that should be searched
k_range    = range(1,31)
weights    = ['uniform' , 'distance']
algos      = ['auto', 'ball_tree', 'kd_tree', 'brute']
leaf_sizes = range(10, 60, 10)
metrics = ["euclidean", "manhattan", "chebyshev", "minkowski", 
"mahalanobis"]

param_grid = dict(n_neighbors = list(k_range), weights = weights, 
algorithm = algos, leaf_size = list(leaf_sizes), metric=metrics)
param_grid

# Instantiate the algorithm
knn = KNeighborsClassifier(n_neighbors=10)

# Instantiate the grid
grid = GridSearchCV(knn, param_grid=param_grid, cv=10, 
scoring='accuracy', n_jobs=-1)

# Fit the models using the grid parameters
grid.fit(X,y)

I assume this is because I have to set or define the ranges for the 
various distance parameters (for example p, w for “wminkowski” - 
WMinkowskiDistance ). The "minkowski" distance may be working because 
its "p" parameter has the default 2.

So my questions are:

1. Can we set the range of parameters for the distance metrics for the 
grid search and if so how?
2. Can we set the value of a parameters for the distance metrics for the 
grid search and if so how?

Hope the question is clear.
TIA


More information about the scikit-learn mailing list