[scikit-learn] Nested Leave One Subject Out (LOSO) cross validation with scikit

Andy t3kcit at gmail.com
Mon Dec 5 08:54:01 EST 2016


I'm not sure what the issue with your custom CV is but this seems like a 
complicated way to implement this.
Try model_selection.LeaveOneGroupOut, which directly implements LOSO

On 12/04/2016 03:12 PM, Ludovico Coletta wrote:
> Dear scikit experts,
>
> I'm struggling with the implementation of a nested cross validation.
>
> My data: I have 26 subjects (13 per class) x 6670 features. I used a 
> feature reduction algorithm (you may have heard about Boruta) to 
> reduce the dimensionality of my data. Problems start now: I defined 
> LOSO as outer partitioning schema. Therefore, for each of the 26 cv 
> folds I used 24 subjects for feature reduction. This lead to a 
> different number of features in each cv fold. Now, for each cv fold I 
> would like to use the same 24 subjects for hyperparameter optimization 
> (SVM with rbf kernel).
>
> This is what I did:
>
> /cv = list(LeaveOneout(len(y))) # in y I stored the labels/
> /
> /
> /inner_train = [None] * len(y)/
> /
> /
> /inner_test =  [None] * len(y)/
> /
> /
> /ii = 0/
> /
> /
> /while ii < len(y):/
> /    cv = list(LeaveOneOut(len(y))) /
> /    a = cv[ii][0]/
> /    a = a[:-1]/
> /    inner_train[ii] = a/
> /
> /
> /    b = cv[ii][0]/
> /    b = np.array(b[((len(cv[0][0]))-1)])/
> /    inner_test[ii]=b/
> /
> /
> /    ii = ii + 1/
> /
> /
> /custom_cv = zip(inner_train,inner_test) # inner cv/
> /
> /
> /
> /
> /pipe_logistic = Pipeline([('scl', StandardScaler()),('clf', 
> SVC(kernel="rbf"))])/
> /
> /
> /parameters = [{'clf__C':  np.logspace(-2, 10, 13), 
> 'clf__gamma':np.logspace(-9, 3, 13)}]/
> /
> /
> /
> /
> /
> /
> /scores = [None] * (len(y)) /
> /
> /
> /ii = 0/
> /
> /
> /while ii < len(scores):/
> /
> /
> /    a = data[ii][0] # data for train/
> /    b = data[ii][1] # data for test/
> /    c = np.concatenate((a,b)) # shape: number of subjects * number of 
> features/
> /    d = cv[ii][0] # labels for train/
> /    e = cv[ii][1] # label for test/
> /    f = np.concatenate((d,e))/
> /
> /
> /    grid_search = GridSearchCV(estimator=pipe_logistic, 
> param_grid=parameters, verbose=1, scoring='accuracy', cv= 
> zip(([custom_cv[ii][0]]), ([custom_cv[ii][1]])))/
> /
> /
> /    scores[ii] = cross_validation.cross_val_score(grid_search, c, 
> y[f], scoring='accuracy', cv = zip(([cv[ii][0]]), ([cv[ii][1]])))/
> /
> /
> /    ii = ii + 1/
> However, I got the following error message: index 25 is out of bounds 
> for size 25
>
> Would it be so bad if I do not perform a nested LOSO but I use the 
> default setting for hyperparameter optimization?
>
> Any help would be really appreciated
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161205/6584ecee/attachment-0001.html>


More information about the scikit-learn mailing list