[scikit-learn] LASSO: Predicted values show negative correlation with observed values on random data

Martin Watzenboeck martin.watzenboeck at gmail.com
Tue Apr 2 14:57:51 EDT 2019


Hello,

I tried to apply LASSO regression in combination with LeaveOneOut CV on my
data, and observed a significant negative correlation between predicted and
observed response values. I tried to replicate the problem using random
data (please see code below).

Anyone have an idea what I am doing wrong? I would very much like to use
LASSO regression on my data. Thanks a lot!

Cheers,
Martin

#Lasso example
from sklearn.linear_model import Lasso
from sklearn.model_selection import LeaveOneOut
from scipy.stats import pearsonr
import numpy as np

n_samples = 500
n_features = 30

#create random features
rng = np.random.RandomState(seed=42)
X = rng.randn(n_samples * n_features).reshape(n_samples, n_features)

#Create Ys
Y = rng.randn(n_samples)

#instantiate regressor and cv object
cv = LeaveOneOut()
reg = Lasso(random_state = 42)


#create arrays to save predicted (and observed) Y values
pred = np.array([])
obs = np.array([])


#run cross validation
for train, test in cv.split(X, Y):

    #fit regressor
    reg.fit(X[train], Y[train])

    #append predicted and observed values to the arrays
    pred = np.r_[pred, reg.predict(X[test])]
    obs = np.r_[obs, Y[test]]

#test correlation
pearsonr(pred, obs)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190402/b27cbb9e/attachment.html>


More information about the scikit-learn mailing list