[scikit-learn] cross validation scores seem off for PLSRegression

Paul Anton Letnes pa at letnes.com
Tue Feb 14 06:27:11 EST 2017


@ is a python operator meaning "matrix multiplication".

<https://www.python.org/dev/peps/pep-0465/>

I was deliberately setting y to the prediction to make sure that the PLS model should be able to recreate the values completely and give a sensible score.

Paul


On 14 February 2017 at 12:08:11 +01:00, Fabian Böhnlein <fabian.boehnlein at gmail.com> wrote:

> Hi Paul,
> 
> not sure what @ syntax does in ipython, but seems you're setting y to the coefficients of the model instead of y_hat = pls.predict(x).
> 
> Also see in the documentation why R^2 can be negative: <http://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.PLSRegression.html#sklearn.cross_decomposition.PLSRegression.score>
> 
> Best,
> Fabian
> 
> 
> On Tue, 14 Feb 2017 at 11:57 Paul Anton Letnes <<pa at letnes.com>> wrote:
> 
> > Hi!
> > 
> > Versions:
> > sklearn 0.18.1
> > numpy 1.11.3
> > Anaconda python 3.5 on ubuntu 16.04
> > 
> > What range is the cross_val_score supposed to be in? I was under the impression from the documentation, although I cannot find it stated explicitly anywhere, that it should be a number in the range [0, 1]. However, it appears that one can get large negative values; see the ipython session below.
> > 
> > Cheers
> > Paul
> > 
> > In [2]: import numpy as np
> > 
> > In [3]: y = np.random.random((10, 3))
> > 
> > In [4]: x = np.random.random((10, 17))
> > 
> > In [5]: from sklearn.cross_decomposition import PLSRegression
> > 
> > In [6]: pls = PLSRegression(n_components=3)
> > 
> > In [7]: from sklearn.cross_validation import cross_val_score
> > 
> > In [8]: from sklearn.model_selection import cross_val_score
> > 
> > In [9]: cross_val_score(pls, x, y)
> > Out[9]: array([-32.52217837, -4.17228083, -5.88632365])
> > 
> > 
> > PS:
> > This happens even if I cheat by setting y to the predicted value, and cross validate on that.
> > 
> > In [29]: y = x @ pls.coef_
> > 
> > In [30]: cross_val_score(pls, x, y)
> > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 5
> > warnings.warn('Y residual constant at iteration %s' % k)
> > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6
> > warnings.warn('Y residual constant at iteration %s' % k)
> > /home/paul/anaconda3/envs/wp3-paper/lib/python3.5/site-packages/sklearn/cross_decomposition/pls_.py:293: UserWarning: Y residual constant at iteration 6
> > warnings.warn('Y residual constant at iteration %s' % k)
> > Out[30]: array([-35.01267353, -4.94806383, -5.9619526 ])
> > 
> > In [34]: np.max(np.abs(y - x @ pls.coef_))
> > Out[34]: 0.0
> > 
> > 
> > _______________________________________________
> > scikit-learn mailing list
> > <scikit-learn at python.org>
> > 
> > <https://mail.python.org/mailman/listinfo/scikit-learn>
> > 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170214/d391bdcb/attachment-0001.html>


More information about the scikit-learn mailing list