[scikit-learn] Nested Leave One Subject Out (LOSO) cross validation with scikit

Raghav R V ragvrv at gmail.com
Sun Dec 4 16:27:02 EST 2016


Hi!

It looks like you are using the old `sklearn.cross_validation`'s
LeaveOneLabelOut cross-validator. It has been deprecated since v0.18.

Use the `LeaveOneLabelOut` from `sklearn.model_selection`, that should fix
your issue I think (thought I have not looked into your code in detail).

HTH!

On Sun, Dec 4, 2016 at 9:12 PM, Ludovico Coletta <ludo25_90 at hotmail.com>
wrote:

> Dear scikit experts,
>
> I'm struggling with the implementation of a nested cross validation.
>
> My data: I have 26 subjects (13 per class) x 6670 features. I used a
> feature reduction algorithm (you may have heard about Boruta) to reduce the
> dimensionality of my data. Problems start now: I defined LOSO as outer
> partitioning schema. Therefore, for each of the 26 cv folds I used 24
> subjects for feature reduction. This lead to a different number of features
> in each cv fold. Now, for each cv fold I would like to use the same 24
> subjects for hyperparameter optimization (SVM with rbf kernel).
>
> This is what I did:
>
> *cv = list(LeaveOneout(len(y))) # in y I stored the labels*
>
> *inner_train = [None] * len(y)*
>
> *inner_test =  [None] * len(y)*
>
> *ii = 0*
>
> *while ii < len(y):*
> *    cv = list(LeaveOneOut(len(y))) *
> *    a = cv[ii][0]*
> *    a = a[:-1]*
> *    inner_train[ii] = a*
>
> *    b = cv[ii][0]*
> *    b = np.array(b[((len(cv[0][0]))-1)])*
> *    inner_test[ii]=b*
>
> *    ii = ii + 1*
>
> *custom_cv = zip(inner_train,inner_test) # inner cv*
>
>
> *pipe_logistic = Pipeline([('scl', StandardScaler()),('clf',
> SVC(kernel="rbf"))])*
>
> *parameters = [{'clf__C':  np.logspace(-2, 10, 13),
> 'clf__gamma':np.logspace(-9, 3, 13)}]*
>
>
>
> *scores = [None] * (len(y)) *
>
> *ii = 0*
>
> *while ii < len(scores):*
>
> *    a = data[ii][0] # data for train*
> *    b = data[ii][1] # data for test*
> *    c = np.concatenate((a,b)) # shape: number of subjects * number of
> features*
> *    d = cv[ii][0] # labels for train*
> *    e = cv[ii][1] # label for test*
> *    f = np.concatenate((d,e))*
>
> *    grid_search = GridSearchCV(estimator=pipe_logistic,
> param_grid=parameters, verbose=1, scoring='accuracy', cv=
> zip(([custom_cv[ii][0]]), ([custom_cv[ii][1]])))*
>
> *    scores[ii] = cross_validation.cross_val_score(grid_search, c, y[f],
> scoring='accuracy', cv = zip(([cv[ii][0]]), ([cv[ii][1]])))*
>
> *    ii = ii + 1*
>
>
>
> However, I got the following error message: index 25 is out of bounds for
> size 25
>
> Would it be so bad if I do not perform a nested LOSO but I use the default
> setting for hyperparameter optimization?
>
> Any help would be really appreciated
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>


-- 
Raghav RV
https://github.com/raghavrv
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161204/2de12424/attachment-0001.html>


More information about the scikit-learn mailing list