[scikit-learn] Random StratifiedKFold Grid Search CV

Raga Markely raga.markely at gmail.com
Thu Jan 26 17:39:41 EST 2017


I was trying to do repeated Grid Search CV (20 repeats). I thought that
each time I call GridSearchCV, the training and test sets separated in
different splits would be different.

However, I got the same best_params_ and best_scores_ for all 20 repeats.
It looks like the training and test sets are separated in identical folds
in each run? Just to clarify, e.g. I have the following data: 0,1,2,3,4.
Class 1 = [0,1,2] and Class 2 = [3,4]. Suppose I call cv = 2. The split is
always for instance [0,3] [1,2,4] in each repeat, and I couldn't get [1,3]
[0,2,4] or other combinations.

If I understand correctly, GridSearchCV uses StratifiedKFold when I enter
cv = integer. The StratifiedKFold command has random state; I wonder if
there is anyway I can make the the training and test sets randomly
separated each time I call the GridSearchCV?

Just a note, I used the following classifiers: Logistic Regression, KNN,
SVC, Kernel SVC, Random Forest, and had the same observation regardless of
the classifiers.

Thank you very much!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170126/c09274e0/attachment.html>

More information about the scikit-learn mailing list