[scikit-learn] Nested Leave One Subject Out (LOSO) cross validation with scikit
Ludovico Coletta
ludo25_90 at hotmail.com
Mon Dec 5 08:39:40 EST 2016
Unfortunately, it did not work.
I think I am doing something wrong when passing the nested cv, but I do not understand where. If I omit the cv argument in the grid search it runs smoothly. I would like to have LeaveOneOut in both the outer and inner cv, how would you implement such a thing?
Best
Ludovico
________________________________
Da: scikit-learn <scikit-learn-bounces+ludo25_90=hotmail.com at python.org> per conto di scikit-learn-request at python.org <scikit-learn-request at python.org>
Inviato: domenica 4 dicembre 2016 22.27
A: scikit-learn at python.org
Oggetto: scikit-learn Digest, Vol 9, Issue 13
Send scikit-learn mailing list submissions to
scikit-learn at python.org
To subscribe or unsubscribe via the World Wide Web, visit
https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...
or, via email, send a message with subject or body 'help' to
scikit-learn-request at python.org
You can reach the person managing the list at
scikit-learn-owner at python.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of scikit-learn digest..."
Today's Topics:
1. Nested Leave One Subject Out (LOSO) cross validation with
scikit (Ludovico Coletta)
2. Re: Adding samplers for intersection/Jensen-Shannon kernels
(avn at mccme.ru)
3. Re: Nested Leave One Subject Out (LOSO) cross validation with
scikit (Raghav R V)
----------------------------------------------------------------------
Message: 1
Date: Sun, 4 Dec 2016 20:12:29 +0000
From: Ludovico Coletta <ludo25_90 at hotmail.com>
To: "scikit-learn at python.org" <scikit-learn at python.org>
Subject: [scikit-learn] Nested Leave One Subject Out (LOSO) cross
validation with scikit
Message-ID:
<BLUPR0301MB2017B71792C520F6ACE525478C800 at BLUPR0301MB2017.namprd03.prod.outlook.com>
Content-Type: text/plain; charset="iso-8859-1"
Dear scikit experts,
I'm struggling with the implementation of a nested cross validation.
My data: I have 26 subjects (13 per class) x 6670 features. I used a feature reduction algorithm (you may have heard about Boruta) to reduce the dimensionality of my data. Problems start now: I defined LOSO as outer partitioning schema. Therefore, for each of the 26 cv folds I used 24 subjects for feature reduction. This lead to a different number of features in each cv fold. Now, for each cv fold I would like to use the same 24 subjects for hyperparameter optimization (SVM with rbf kernel).
This is what I did:
cv = list(LeaveOneout(len(y))) # in y I stored the labels
inner_train = [None] * len(y)
inner_test = [None] * len(y)
ii = 0
while ii < len(y):
cv = list(LeaveOneOut(len(y)))
a = cv[ii][0]
a = a[:-1]
inner_train[ii] = a
b = cv[ii][0]
b = np.array(b[((len(cv[0][0]))-1)])
inner_test[ii]=b
ii = ii + 1
custom_cv = zip(inner_train,inner_test) # inner cv
pipe_logistic = Pipeline([('scl', StandardScaler()),('clf', SVC(kernel="rbf"))])
parameters = [{'clf__C': np.logspace(-2, 10, 13), 'clf__gamma':np.logspace(-9, 3, 13)}]
scores = [None] * (len(y))
ii = 0
while ii < len(scores):
a = data[ii][0] # data for train
b = data[ii][1] # data for test
c = np.concatenate((a,b)) # shape: number of subjects * number of features
d = cv[ii][0] # labels for train
e = cv[ii][1] # label for test
f = np.concatenate((d,e))
grid_search = GridSearchCV(estimator=pipe_logistic, param_grid=parameters, verbose=1, scoring='accuracy', cv= zip(([custom_cv[ii][0]]), ([custom_cv[ii][1]])))
scores[ii] = cross_validation.cross_val_score(grid_search, c, y[f], scoring='accuracy', cv = zip(([cv[ii][0]]), ([cv[ii][1]])))
ii = ii + 1
However, I got the following error message: index 25 is out of bounds for size 25
Would it be so bad if I do not perform a nested LOSO but I use the default setting for hyperparameter optimization?
Any help would be really appreciated
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161204/1e5e5ec9/attachment-0001.html>
------------------------------
Message: 2
Date: Sun, 04 Dec 2016 23:50:21 +0300
From: avn at mccme.ru
To: Scikit-learn user and developer mailing list
<scikit-learn at python.org>
Subject: Re: [scikit-learn] Adding samplers for
intersection/Jensen-Shannon kernels
Message-ID: <0511d5fa33737f78ccdf7fbb2e5b2156 at mccme.ru>
Content-Type: text/plain; charset=UTF-8; format=flowed
I see now. So I'll proceed with adding documentation and unit tests for
those kernels to complete their support.
And I don't think they're too specialized, given that many kinds of
feature vectors in e.g. computer vision are in fact histograms and all
of those kernels are histogram-oriented.
Andy ????? 2016-12-04 00:23:
> Hi Valery.
> I didn't include them because the Chi2 worked better for my task ;)
> In hindsight, I'm not sure if these kernels are not to a bit too
> specialized for scikit-learn.
> But given that we have the (slightly more obscure) SkewedChi2 and
> AdditiveChi2,
> I think the intersection one would be a good addition if you found it
> useful.
>
> Andy
>
> On 12/03/2016 03:39 PM, Valery Anisimovsky via scikit-learn wrote:
>> Hello,
>>
>> In the course of my work, I've made samplers for
>> intersection/Jensen-Shannon kernels, just by small modifications to
>> sklearn.kernel_approximation.AdditiveChi2Sampler code. Intersection
>> kernel proved to be the best one for my task (clustering Docstrum
>> feature vectors), so perhaps it'd be good to add those samplers
>> alongside AdditiveChi2Sampler? Should I proceed with creating a pull
>> request? Or, perhaps, those kernels were not already included for some
>> good reason?
>>
>> With best regards,
>> -- Valery
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...
------------------------------
Message: 3
Date: Sun, 4 Dec 2016 22:27:02 +0100
From: Raghav R V <ragvrv at gmail.com>
To: Scikit-learn user and developer mailing list
<scikit-learn at python.org>
Subject: Re: [scikit-learn] Nested Leave One Subject Out (LOSO) cross
validation with scikit
Message-ID:
<CACmxyDFRO0T_wxk8Z=sY-0CO2c2g-OFgqZvYjXQ5EYF7ksZ1-w at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi!
It looks like you are using the old `sklearn.cross_validation`'s
LeaveOneLabelOut cross-validator. It has been deprecated since v0.18.
Use the `LeaveOneLabelOut` from `sklearn.model_selection`, that should fix
your issue I think (thought I have not looked into your code in detail).
HTH!
On Sun, Dec 4, 2016 at 9:12 PM, Ludovico Coletta <ludo25_90 at hotmail.com>
wrote:
> Dear scikit experts,
>
> I'm struggling with the implementation of a nested cross validation.
>
> My data: I have 26 subjects (13 per class) x 6670 features. I used a
> feature reduction algorithm (you may have heard about Boruta) to reduce the
> dimensionality of my data. Problems start now: I defined LOSO as outer
> partitioning schema. Therefore, for each of the 26 cv folds I used 24
> subjects for feature reduction. This lead to a different number of features
> in each cv fold. Now, for each cv fold I would like to use the same 24
> subjects for hyperparameter optimization (SVM with rbf kernel).
>
> This is what I did:
>
> *cv = list(LeaveOneout(len(y))) # in y I stored the labels*
>
> *inner_train = [None] * len(y)*
>
> *inner_test = [None] * len(y)*
>
> *ii = 0*
>
> *while ii < len(y):*
> * cv = list(LeaveOneOut(len(y))) *
> * a = cv[ii][0]*
> * a = a[:-1]*
> * inner_train[ii] = a*
>
> * b = cv[ii][0]*
> * b = np.array(b[((len(cv[0][0]))-1)])*
> * inner_test[ii]=b*
>
> * ii = ii + 1*
>
> *custom_cv = zip(inner_train,inner_test) # inner cv*
>
>
> *pipe_logistic = Pipeline([('scl', StandardScaler()),('clf',
> SVC(kernel="rbf"))])*
>
> *parameters = [{'clf__C': np.logspace(-2, 10, 13),
> 'clf__gamma':np.logspace(-9, 3, 13)}]*
>
>
>
> *scores = [None] * (len(y)) *
>
> *ii = 0*
>
> *while ii < len(scores):*
>
> * a = data[ii][0] # data for train*
> * b = data[ii][1] # data for test*
> * c = np.concatenate((a,b)) # shape: number of subjects * number of
> features*
> * d = cv[ii][0] # labels for train*
> * e = cv[ii][1] # label for test*
> * f = np.concatenate((d,e))*
>
> * grid_search = GridSearchCV(estimator=pipe_logistic,
> param_grid=parameters, verbose=1, scoring='accuracy', cv=
> zip(([custom_cv[ii][0]]), ([custom_cv[ii][1]])))*
>
> * scores[ii] = cross_validation.cross_val_score(grid_search, c, y[f],
> scoring='accuracy', cv = zip(([cv[ii][0]]), ([cv[ii][1]])))*
>
> * ii = ii + 1*
>
>
>
> However, I got the following error message: index 25 is out of bounds for
> size 25
>
> Would it be so bad if I do not perform a nested LOSO but I use the default
> setting for hyperparameter optimization?
>
> Any help would be really appreciated
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...
>
>
--
Raghav RV
https://github.com/raghavrv
[https://avatars2.githubusercontent.com/u/9487348?v=3&s=400]<https://github.com/raghavrv>
raghavrv (Raghav RV) · GitHub<https://github.com/raghavrv>
github.com
raghavrv has 18 repositories available. Follow their code on GitHub.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161204/2de12424/attachment.html>
------------------------------
Subject: Digest Footer
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn>
mail.python.org
To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ...
------------------------------
End of scikit-learn Digest, Vol 9, Issue 13
*******************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161205/4607921d/attachment-0001.html>
More information about the scikit-learn
mailing list