[scikit-learn] logistic regression results are not stable between solvers
Benoît Presles
benoit.presles at u-bourgogne.fr
Thu Jan 9 09:22:37 EST 2020
Hi Andy,
As you can notice in the code, I fixed C=1e9, so the intercept with
liblinear is not penalised and therefore I get the same solutions with
these solvers when everything goes well.
How can I check the objective of the l-bfgs and liblinear solvers with
sklearn?
Best regards,
Ben
On 08/01/2020 21:53, Andreas Mueller wrote:
> Hi Ben.
>
> Liblinear and l-bfgs might both converge but to different solutions,
> given that the intercept is penalized.
> There is also problems with ill-conditioned problems that are hard to
> detect.
> My impression of SAGA was that the convergence checks are too loose
> and we should improve them.
> Have you checked the objective of the l-bfgs and liblinear solvers?
> With ill-conditioned data the objectives could be similar with
> different solutions.
>
> It's not intended for scikit-learn to warn about ill-conditioned
> problems, I think, only convergence issues.
>
> Hth,
> Andy
>
>
> On 1/8/20 3:31 PM, Benoît Presles wrote:
>> With lbfgs n_iter_ = 48, with saga n_iter_ = 326581, with liblinear
>> n_iter_ = 64.
>>
>>
>> On 08/01/2020 21:18, Guillaume Lemaître wrote:
>>> We issue convergence warning. Can you check n_iter to be sure that
>>> you did not convergence to the stated convergence?
>>>
>>> On Wed, 8 Jan 2020 at 20:53, Benoît Presles
>>> <benoit.presles at u-bourgogne.fr
>>> <mailto:benoit.presles at u-bourgogne.fr>> wrote:
>>>
>>> Dear sklearn users,
>>>
>>> I still have some issues concerning logistic regression.
>>> I did compare on the same data (simulated data) sklearn with
>>> three different solvers (lbfgs, saga, liblinear) and statsmodels.
>>>
>>> When everything goes well, I get the same results between lbfgs,
>>> saga, liblinear and statsmodels. When everything goes wrong, all
>>> the results are different.
>>>
>>> In fact, when everything goes wrong, statsmodels gives me a
>>> convergence warning (Warning: Maximum number of iterations has
>>> been exceeded. Current function value: inf Iterations: 20000) +
>>> an error (numpy.linalg.LinAlgError: Singular matrix).
>>>
>>> Why sklearn does not tell me anything? How can I know that I
>>> have convergence issues with sklearn?
>>>
>>>
>>> Thanks for your help,
>>> Best regards,
>>> Ben
>>>
>>> --------------------------------------------
>>>
>>> Here is the code I used to generate synthetic data:
>>>
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import StratifiedShuffleSplit
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.linear_model import LogisticRegression
>>> import statsmodels.api as sm
>>> #
>>> RANDOM_SEED = 2
>>> #
>>> X_sim, y_sim = make_classification(n_samples=200,
>>> n_features=20,
>>> n_informative=10,
>>> n_redundant=0,
>>> n_repeated=0,
>>> n_classes=2,
>>> n_clusters_per_class=1,
>>> random_state=RANDOM_SEED,
>>> shuffle=False)
>>> #
>>> sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2,
>>> random_state=RANDOM_SEED)
>>> for train_index_split, test_index_split in sss.split(X_sim, y_sim):
>>> X_split_train, X_split_test = X_sim[train_index_split],
>>> X_sim[test_index_split]
>>> y_split_train, y_split_test = y_sim[train_index_split],
>>> y_sim[test_index_split]
>>> ss = StandardScaler()
>>> X_split_train = ss.fit_transform(X_split_train)
>>> X_split_test = ss.transform(X_split_test)
>>> #
>>> classifier_lbfgs = LogisticRegression(fit_intercept=True,
>>> max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
>>> solver='lbfgs',
>>> penalty='none', tol=1e-6)
>>> classifier_lbfgs.fit(X_split_train, y_split_train)
>>> print('classifier lbfgs iter:', classifier_lbfgs.n_iter_)
>>> print(classifier_lbfgs.intercept_)
>>> print(classifier_lbfgs.coef_)
>>> #
>>> classifier_saga = LogisticRegression(fit_intercept=True,
>>> max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
>>> solver='saga',
>>> penalty='none', tol=1e-6)
>>> classifier_saga.fit(X_split_train, y_split_train)
>>> print('classifier saga iter:', classifier_saga.n_iter_)
>>> print(classifier_saga.intercept_)
>>> print(classifier_saga.coef_)
>>> #
>>> classifier_liblinear =
>>> LogisticRegression(fit_intercept=True, max_iter=20000000,
>>> verbose=0, random_state=RANDOM_SEED,
>>> C=1e9,
>>> solver='liblinear', penalty='l2', tol=1e-6)
>>> classifier_liblinear.fit(X_split_train, y_split_train)
>>> print('classifier liblinear iter:',
>>> classifier_liblinear.n_iter_)
>>> print(classifier_liblinear.intercept_)
>>> print(classifier_liblinear.coef_)
>>> # statsmodels
>>> logit = sm.Logit(y_split_train,
>>> sm.tools.add_constant(X_split_train))
>>> logit_res = logit.fit(maxiter=20000)
>>> print("Coef statsmodels")
>>> print(logit_res.params)
>>>
>>>
>>>
>>> On 11/10/2019 15:42, Andreas Mueller wrote:
>>>>
>>>>
>>>> On 10/10/19 1:14 PM, Benoît Presles wrote:
>>>>>
>>>>> Thanks for your answers.
>>>>>
>>>>> On my real data, I do not have so many samples. I have a bit
>>>>> more than 200 samples in total and I also would like to get
>>>>> some results with unpenalized logisitic regression.
>>>>> What do you suggest? Should I switch to the lbfgs solver?
>>>> Yes.
>>>>> Am I sure that with this solver I will not have any
>>>>> convergence issue and always get the good result? Indeed, I
>>>>> did not get any convergence warning with saga, so I thought
>>>>> everything was fine. I noticed some issues only when I decided
>>>>> to test several solvers. Without comparing the results across
>>>>> solvers, how to be sure that the optimisation goes well?
>>>>> Shouldn't scikit-learn warn the user somehow if it is not the
>>>>> case?
>>>> We should attempt to warn in the SAGA solver if it doesn't
>>>> converge. That it doesn't raise a convergence warning should
>>>> probably be considered a bug.
>>>> It uses the maximum weight change as a stopping criterion right
>>>> now.
>>>> We could probably compute the dual objective once in the end to
>>>> see if we converged, right? Or is that not possible with SAGA?
>>>> If not, we might want to caution that no convergence warning
>>>> will be raised.
>>>>
>>>>>
>>>>> At last, I was using saga because I also wanted to do some
>>>>> feature selection by using l1 penalty which is not supported
>>>>> by lbfgs...
>>>> You can use liblinear then.
>>>>
>>>>
>>>>>
>>>>> Best regards,
>>>>> Ben
>>>>>
>>>>>
>>>>> Le 09/10/2019 à 23:39, Guillaume Lemaître a écrit :
>>>>>> Ups I did not see the answer of Roman. Sorry about that. It
>>>>>> is coming back to the same conclusion :)
>>>>>>
>>>>>> On Wed, 9 Oct 2019 at 23:37, Guillaume Lemaître
>>>>>> <g.lemaitre58 at gmail.com <mailto:g.lemaitre58 at gmail.com>> wrote:
>>>>>>
>>>>>> Uhm actually increasing to 10000 samples solve the
>>>>>> convergence issue.
>>>>>> SAGA is not designed to work with a so small sample size
>>>>>> most probably.
>>>>>>
>>>>>> On Wed, 9 Oct 2019 at 23:36, Guillaume Lemaître
>>>>>> <g.lemaitre58 at gmail.com <mailto:g.lemaitre58 at gmail.com>>
>>>>>> wrote:
>>>>>>
>>>>>> I slightly change the bench such that it uses
>>>>>> pipeline and plotted the coefficient:
>>>>>>
>>>>>> https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
>>>>>>
>>>>>> I only see one of the 10 splits where SAGA is not
>>>>>> converging, otherwise the coefficients
>>>>>> look very close (I don't attach the figure here but
>>>>>> they can be plotted using the snippet).
>>>>>> So apart from this second split, the other
>>>>>> differences seems to be numerical instability.
>>>>>>
>>>>>> Where I have some concern is regarding the
>>>>>> convergence rate of SAGA but I have no
>>>>>> intuition to know if this is normal or not.
>>>>>>
>>>>>> On Wed, 9 Oct 2019 at 23:22, Roman Yurchak
>>>>>> <rth.yurchak at gmail.com
>>>>>> <mailto:rth.yurchak at gmail.com>> wrote:
>>>>>>
>>>>>> Ben,
>>>>>>
>>>>>> I can confirm your results with penalty='none'
>>>>>> and C=1e9. In both cases,
>>>>>> you are running a mostly unpenalized logisitic
>>>>>> regression. Usually
>>>>>> that's less numerically stable than with a small
>>>>>> regularization,
>>>>>> depending on the data collinearity.
>>>>>>
>>>>>> Running that same code with
>>>>>> - larger penalty ( smaller C values)
>>>>>> - or larger number of samples
>>>>>> yields for me the same coefficients (up to some
>>>>>> tolerance).
>>>>>>
>>>>>> You can also see that SAGA convergence is not
>>>>>> good by the fact that it
>>>>>> needs 196000 epochs/iterations to converge.
>>>>>>
>>>>>> Actually, I have often seen convergence issues
>>>>>> with SAG on small
>>>>>> datasets (in unit tests), not fully sure why.
>>>>>>
>>>>>> --
>>>>>> Roman
>>>>>>
>>>>>> On 09/10/2019 22:10, serafim loukas wrote:
>>>>>> > The predictions across solver are exactly the
>>>>>> same when I run the code.
>>>>>> > I am using 0.21.3 version. What is yours?
>>>>>> >
>>>>>> >
>>>>>> > In [13]: import sklearn
>>>>>> >
>>>>>> > In [14]: sklearn.__version__
>>>>>> > Out[14]: '0.21.3'
>>>>>> >
>>>>>> >
>>>>>> > Serafeim
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >> On 9 Oct 2019, at 21:44, Benoît Presles
>>>>>> <benoit.presles at u-bourgogne.fr
>>>>>> <mailto:benoit.presles at u-bourgogne.fr>
>>>>>> >> <mailto:benoit.presles at u-bourgogne.fr
>>>>>> <mailto:benoit.presles at u-bourgogne.fr>>> wrote:
>>>>>> >>
>>>>>> >> (y_pred_lbfgs==y_pred_saga).all() == False
>>>>>> >
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > scikit-learn mailing list
>>>>>> > scikit-learn at python.org
>>>>>> <mailto:scikit-learn at python.org>
>>>>>> >
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>> >
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> <mailto:scikit-learn at python.org>
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Guillaume Lemaitre
>>>>>> Scikit-learn @ Inria Foundation
>>>>>> https://glemaitre.github.io/
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Guillaume Lemaitre
>>>>>> Scikit-learn @ Inria Foundation
>>>>>> https://glemaitre.github.io/
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Guillaume Lemaitre
>>>>>> Scikit-learn @ Inria Foundation
>>>>>> https://glemaitre.github.io/
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>>
>>> --
>>> Guillaume Lemaitre
>>> Scikit-learn @ Inria Foundation
>>> https://glemaitre.github.io/
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200109/3a2f21bb/attachment-0001.html>
More information about the scikit-learn
mailing list