[scikit-learn] Difference in normalization between Lasso and LogisticRegression + L1

Wed May 29 13:42:04 EDT 2019

Hi Jesse,

I think there was an effort to compare normalization methods on the data
attachment term between Lasso and Ridge regression back in 2012/13, but
this might have not been finished or extended to Logistic Regression.

If it is not documented well, it could definitely benefit from a
documentation update.

As for changing it to a more consistent state, that would require adding a
keyword argument pertaining to this functionality and, after discussion,
possibly changing the default value after some deprecation cycles (though
this seems like a dangerous one to change at all imho).

Michael

On Wed, May 29, 2019 at 10:38 AM Jesse Livezey <jesse.livezey at gmail.com>
wrote:

> Hi everyone,
>
> I noticed recently that in the Lasso implementation (and docs), the MSE
> term is normalized by the number of samples
>
> https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html
>
> but for LogisticRegression + L1, the logloss does not seem to be
> normalized by the number of samples. One consequence is that the strength
> of the regularization depends on the number of samples explicitly. For
> instance, in Lasso, if you tile a dataset N times, you will learn the same
> coef, but in LogisticRegression, you will learn a different coef.
>
> Is this the intended behavior of LogisticRegression? I was surprised by
> this. Either way, it would be helpful to document this more clearly in the
> Logistic Regression docs (I can make a PR.)
>
> https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
>
> Jesse
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190529/782df37f/attachment.html>