[scikit-learn] anti-correlated predictions by SVR
gael.varoquaux at normalesup.org
Tue Sep 26 12:56:12 EDT 2017
I took my example in classification for didactic purposes. My hypothesis still holds that the splitting of the data creates anti correlations between train and test (a depletion effect).
Basically , you shouldn't work with datasets that small.
Sent from my phone, please excuse typos and briefness
On Sep 26, 2017, 18:51, at 18:51, Thomas Evangelidis <tevang3 at gmail.com> wrote:
>I have very small training sets (10-50 observations). Currently, I am
>working with 16 observations for training and 25 for validation
>test set). And I am doing Regression, not Classification (hence the SVR
>instead of SVC).
>On 26 September 2017 at 18:21, Gael Varoquaux
><gael.varoquaux at normalesup.org
>> Hypothesis: you have a very small dataset and when you leave out
>> you create a distribution shift between the train and the test. A
>> simplified example: 20 samples, 10 class a, 10 class b. A
>> cross-validation will create a training set of 10 samples of one
>> samples of the other, and the test set is composed of the class that
>> minority on the train set.
>> On Tue, Sep 26, 2017 at 06:10:39PM +0200, Thomas Evangelidis wrote:
>> > Greetings,
>> > I don't know if anyone encountered this before, but sometimes I get
>> > anti-correlated predictions by the SVR I that am training. Namely,
>> > Pearson's R and Kendall's tau are negative when I compare the
>> predictions on
>> > the external test set with the true values. However, the SVR
>> on the
>> > training set have positive correlations with the experimental
>> > I can't think of a way to know in advance if the trained SVR will
>> > anti-correlated predictions in order to change their sign and avoid
>> > disaster. Here is an example of what I mean:
>> > Training set predictions: R=0.452422, tau=0.333333
>> > External test set predictions: R=-0.537420, tau-0.300000
>> > Obviously, in a real case scenario where I wouldn't have the
>> test set
>> > I would have used the worst observation instead of the best ones.
>> > any idea about how I could prevent this?
>> > thanks in advance
>> > Thomas
>> Gael Varoquaux
>> Researcher, INRIA Parietal
>> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>> Phone: ++ 33-1-69-08-79-68
>> scikit-learn mailing list
>> scikit-learn at python.org
>Dr Thomas Evangelidis
>CEITEC - Central European Institute of Technology
>62500 Brno, Czech Republic
>email: tevang at pharm.uoa.gr
> tevang3 at gmail.com
>scikit-learn mailing list
>scikit-learn at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the scikit-learn