[scikit-learn] How do we define a distance metric's parameter for grid search

Andrew Howe ahowe42 at gmail.com
Mon Jun 27 07:58:52 EDT 2016


Yeah I know :-).  I did it like that for a specific reason which I no
longer remember :-D.  But, you know, it was probably a good one...hahaha

Andrew

<~~~~~~~~~~~~~~~~~~~~~~~~~~~>
J. Andrew Howe, PhD
Editor-in-Chief, European Journal of Mathematical Sciences
Executive Editor, European Journal of Pure and Applied Mathematics
www.andrewhowe.com
http://www.linkedin.com/in/ahowe42
https://www.researchgate.net/profile/John_Howe12/
I live to learn, so I can learn to live. - me
<~~~~~~~~~~~~~~~~~~~~~~~~~~~>

On Mon, Jun 27, 2016 at 2:37 PM, Joel Nothman <joel.nothman at gmail.com>
wrote:

> Hi Hugo,
>
> Andrew's approach -- using a list of dicts to specify multiple parameter
> grids -- is the correct one.
>
> However, Andrew, you don't need to include parameters that will be ignored
> into your parameter grid. The following will be effectively the same:
>
> params =
> [{'kernel':['poly'],'degree':[1,2,3],'gamma':[1/p,1,2],'coef0':[-1,0,1]},
> {'kernel':['rbf'],'gamma':[1/p,1,2]},
> {'kernel':['sigmoid'],'gamma':[1/p,1,2],'coef0':[-1,0,1]}]
>
> Joel
>
> On 27 June 2016 at 20:59, Andrew Howe <ahowe42 at gmail.com> wrote:
>
>> I did something similar where I was using GridSearchCV over different
>> kernel functions for SVM and not all kernel functions use the same
>> parameters.  For example, the *degree* parameter is only used by the
>> *poly* kernel.
>>
>> from sklearn import svm
>> from sklearn import cross_validation
>> from sklearn import grid_search
>>
>> params =
>> [{'kernel':['poly'],'degree':[1,2,3],'gamma':[1/p,1,2],'coef0':[-1,0,1]},\
>> {'kernel':['rbf'],'gamma':[1/p,1,2],'degree':[3],'coef0':[0]},\
>> {'kernel':['sigmoid'],'gamma':[1/p,1,2],'coef0':[-1,0,1],'degree':[3]}]
>> GSC = grid_search.GridSearchCV(estimator = svm.SVC(), param_grid =
>> params,\
>>     cv = cvrand, n_jobs = -1)
>>
>> This worked in this instance because the svm.SVC() object only passes
>> parameters to the kernel functions as needed:
>> [image: Inline image 1]
>>
>> Hence, even though my list of dicts includes all three parameters for all
>> types of kernels I used, they were selectively ignored.  I'm not sure about
>> parameters for the distance metrics for the KNN object, but it's a good bet
>> it works the same way.
>>
>> Andrew
>>
>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>> J. Andrew Howe, PhD
>> Editor-in-Chief, European Journal of Mathematical Sciences
>> Executive Editor, European Journal of Pure and Applied Mathematics
>> www.andrewhowe.com
>> http://www.linkedin.com/in/ahowe42
>> https://www.researchgate.net/profile/John_Howe12/
>> I live to learn, so I can learn to live. - me
>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>
>> On Mon, Jun 27, 2016 at 1:27 PM, Hugo Ferreira <hmf at inesctec.pt> wrote:
>>
>>> Hello,
>>>
>>> I have posted this question in Stackoverflow and did not get an answer.
>>> This seems to be a basic usage question and am therefore sending it here.
>>>
>>> I have following code snippet that attempts to do a grid search in which
>>> one of the grid parameters are the distance metrics to be used for the KNN
>>> algorithm. The example below fails if I use "wminkowski", "seuclidean" or
>>> "mahalanobis" distances metrics.
>>>
>>> # Define the parameter values that should be searched
>>> k_range    = range(1,31)
>>> weights    = ['uniform' , 'distance']
>>> algos      = ['auto', 'ball_tree', 'kd_tree', 'brute']
>>> leaf_sizes = range(10, 60, 10)
>>> metrics = ["euclidean", "manhattan", "chebyshev", "minkowski",
>>> "mahalanobis"]
>>>
>>> param_grid = dict(n_neighbors = list(k_range), weights = weights,
>>> algorithm = algos, leaf_size = list(leaf_sizes), metric=metrics)
>>> param_grid
>>>
>>> # Instantiate the algorithm
>>> knn = KNeighborsClassifier(n_neighbors=10)
>>>
>>> # Instantiate the grid
>>> grid = GridSearchCV(knn, param_grid=param_grid, cv=10,
>>> scoring='accuracy', n_jobs=-1)
>>>
>>> # Fit the models using the grid parameters
>>> grid.fit(X,y)
>>>
>>> I assume this is because I have to set or define the ranges for the
>>> various distance parameters (for example p, w for “wminkowski” -
>>> WMinkowskiDistance ). The "minkowski" distance may be working because its
>>> "p" parameter has the default 2.
>>>
>>> So my questions are:
>>>
>>> 1. Can we set the range of parameters for the distance metrics for the
>>> grid search and if so how?
>>> 2. Can we set the value of a parameters for the distance metrics for the
>>> grid search and if so how?
>>>
>>> Hope the question is clear.
>>> TIA
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160627/3cfd5e55/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 43248 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160627/3cfd5e55/attachment-0001.png>


More information about the scikit-learn mailing list