[scikit-learn] anti-correlated predictions by SVR
Sebastian Raschka
se.raschka at gmail.com
Tue Sep 26 12:58:00 EDT 2017
I'd agree with Gael that a potential explanation could be the distribution shift upon splitting (usually the smaller the dataset, the more this is of an issue). As potential solutions/workarounds, you could try
a) stratified sampling for regression, if you'd like to stick with the 2-way holdout method
b) use leave-one-out cross validation for evaluation (your model will likely benefit from the additional training samples)
c) use leave-one-out boostrap (at each round, draw a bootstrap sample from the dataset for training, then use the points not in the training dataset for testing)
Best,
Sebastian
> On Sep 26, 2017, at 12:48 PM, Thomas Evangelidis <tevang3 at gmail.com> wrote:
>
> I have very small training sets (10-50 observations). Currently, I am working with 16 observations for training and 25 for validation (external test set). And I am doing Regression, not Classification (hence the SVR instead of SVC).
>
>
> On 26 September 2017 at 18:21, Gael Varoquaux <gael.varoquaux at normalesup.org> wrote:
> Hypothesis: you have a very small dataset and when you leave out data,
> you create a distribution shift between the train and the test. A
> simplified example: 20 samples, 10 class a, 10 class b. A leave-one-out
> cross-validation will create a training set of 10 samples of one class, 9
> samples of the other, and the test set is composed of the class that is
> minority on the train set.
>
> G
>
> On Tue, Sep 26, 2017 at 06:10:39PM +0200, Thomas Evangelidis wrote:
> > Greetings,
>
> > I don't know if anyone encountered this before, but sometimes I get
> > anti-correlated predictions by the SVR I that am training. Namely, the
> > Pearson's R and Kendall's tau are negative when I compare the predictions on
> > the external test set with the true values. However, the SVR predictions on the
> > training set have positive correlations with the experimental values and hence
> > I can't think of a way to know in advance if the trained SVR will produce
> > anti-correlated predictions in order to change their sign and avoid the
> > disaster. Here is an example of what I mean:
>
> > Training set predictions: R=0.452422, tau=0.333333
> > External test set predictions: R=-0.537420, tau-0.300000
>
> > Obviously, in a real case scenario where I wouldn't have the external test set
> > I would have used the worst observation instead of the best ones. Has anybody
> > any idea about how I could prevent this?
>
> > thanks in advance
> > Thomas
> --
> Gael Varoquaux
> Researcher, INRIA Parietal
> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
> Phone: ++ 33-1-69-08-79-68
> http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> --
> ======================================================================
> Dr Thomas Evangelidis
> Post-doctoral Researcher
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/2S049,
> 62500 Brno, Czech Republic
>
> email: tevang at pharm.uoa.gr
> tevang3 at gmail.com
>
> website: https://sites.google.com/site/thomasevangelidishomepage/
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
More information about the scikit-learn
mailing list