[scikit-learn] anti-correlated predictions by SVR

Gael Varoquaux gael.varoquaux at normalesup.org
Tue Sep 26 12:21:37 EDT 2017

Hypothesis: you have a very small dataset and when you leave out data,
you create a distribution shift between the train and the test. A
simplified example: 20 samples, 10 class a, 10 class b. A leave-one-out
cross-validation will create a training set of 10 samples of one class, 9
samples of the other, and the test set is composed of the class that is
minority on the train set.


On Tue, Sep 26, 2017 at 06:10:39PM +0200, Thomas Evangelidis wrote:
> Greetings,

> I don't know if anyone encountered this before, but sometimes I get
> anti-correlated predictions by the SVR I that am training. Namely, the
> Pearson's R and Kendall's tau are negative when I compare the predictions on
> the external test set with the true values. However, the SVR predictions on the
> training set have positive correlations with the experimental values and hence
> I can't think of a way to know in advance if the trained SVR will produce
> anti-correlated predictions in order to change their sign and avoid the
> disaster. Here is an example of what I mean:

> Training set predictions: R=0.452422, tau=0.333333
> External test set predictions: R=-0.537420, tau-0.300000

> Obviously, in a real case scenario where I wouldn't have the external test set
> I would have used the worst observation instead of the best ones. Has anybody
> any idea about how I could prevent this?

> thanks in advance
> Thomas
    Gael Varoquaux
    Researcher, INRIA Parietal
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  ++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux

More information about the scikit-learn mailing list