[scikit-learn] Difference in normalization between Lasso and LogisticRegression + L1
michael.eickenberg at gmail.com
Wed May 29 13:42:04 EDT 2019
I think there was an effort to compare normalization methods on the data
attachment term between Lasso and Ridge regression back in 2012/13, but
this might have not been finished or extended to Logistic Regression.
If it is not documented well, it could definitely benefit from a
As for changing it to a more consistent state, that would require adding a
keyword argument pertaining to this functionality and, after discussion,
possibly changing the default value after some deprecation cycles (though
this seems like a dangerous one to change at all imho).
On Wed, May 29, 2019 at 10:38 AM Jesse Livezey <jesse.livezey at gmail.com>
> Hi everyone,
> I noticed recently that in the Lasso implementation (and docs), the MSE
> term is normalized by the number of samples
> but for LogisticRegression + L1, the logloss does not seem to be
> normalized by the number of samples. One consequence is that the strength
> of the regularization depends on the number of samples explicitly. For
> instance, in Lasso, if you tile a dataset N times, you will learn the same
> coef, but in LogisticRegression, you will learn a different coef.
> Is this the intended behavior of LogisticRegression? I was surprised by
> this. Either way, it would be helpful to document this more clearly in the
> Logistic Regression docs (I can make a PR.)
> scikit-learn mailing list
> scikit-learn at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the scikit-learn