Use of Scaler with LassoCV, RidgeCV
Hi all, I was trying to use scikit-learn LassoCV/RidgeCV while applying a 'StandardScaler' on each fold set. I do not want to apply the scaler before the cross-validation to avoid leakage but I cannot figure out how I am supposed to do that with LassoCV/RidgeCV. Is there a way to do this ? Or should I create a pipeline with Lasso/Ridge and 'manually' search for the hyper-parameters (using GridSearchCV for instance) ? Many thanks. Yoann
Hmm. I would scale the training data, and then use the same scaling on the test and validation data. This isn’t quite what you asked, but it’s close and does involve transformations and pipelines. Perhaps you can modify according to your use case, introducing the scaling before PolynomialFeatures is called. https://www.datarobot.com/blog/regularized-linear-regression-with-scikit-lea... __________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science 770-658-5176 | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith@macys.com From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com@python.org] On Behalf Of Brenet, Yoann Sent: Tuesday, September 13, 2016 8:16 AM To: scikit-learn@python.org Subject: [scikit-learn] Use of Scaler with LassoCV, RidgeCV ⚠ EXT MSG: Hi all, I was trying to use scikit-learn LassoCV/RidgeCV while applying a 'StandardScaler' on each fold set. I do not want to apply the scaler before the cross-validation to avoid leakage but I cannot figure out how I am supposed to do that with LassoCV/RidgeCV. Is there a way to do this ? Or should I create a pipeline with Lasso/Ridge and 'manually' search for the hyper-parameters (using GridSearchCV for instance) ? Many thanks. Yoann * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments.
Hi, Yoann, when I understand correctly, you want to apply the scaling to each iteration in cross-validation (i.e., the recommended way to do it)? Here, you could use the make_pipeline function, which will call fit on each training fold and call transform on each test fold: from sklearn.preprocessing import StandardScaler from sklearn.pipeline import make_pipeline from sklearn.cross_validation import cross_val_score from sklearn.linear_model import Ridge pipe = make_pipeline(StandardScaler(), Ridge()) cross_val_score(pipe, X, y) You can think of “pipe” as an Ridge estimator with a StandardScaler attached to it. Best, Sebastian
On Sep 13, 2016, at 8:16 AM, Brenet, Yoann <yoann.brenet@se1.bp.com> wrote:
Hi all,
I was trying to use scikit-learn LassoCV/RidgeCV while applying a 'StandardScaler' on each fold set. I do not want to apply the scaler before the cross-validation to avoid leakage but I cannot figure out how I am supposed to do that with LassoCV/RidgeCV.
Is there a way to do this ? Or should I create a pipeline with Lasso/Ridge and 'manually' search for the hyper-parameters (using GridSearchCV for instance) ?
Many thanks.
Yoann _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (3)
-
Brenet, Yoann -
Dale T Smith -
Sebastian Raschka