[scikit-learn] logistic regression results are not stable between solvers

Thu Jan 9 09:22:37 EST 2020

Hi Andy,

As you can notice in the code, I fixed C=1e9, so the intercept with 
liblinear is not penalised and therefore I get the same solutions with 
these solvers when everything goes well.

How can I check the objective of the l-bfgs and liblinear solvers with 
sklearn?

Best regards,
Ben

On 08/01/2020 21:53, Andreas Mueller wrote:
> Hi Ben.
>
> Liblinear and l-bfgs might both converge but to different solutions, 
> given that the intercept is penalized.
> There is also problems with ill-conditioned problems that are hard to 
> detect.
> My impression of SAGA was that the convergence checks are too loose 
> and we should improve them.
> Have you checked the objective of the l-bfgs and liblinear solvers? 
> With ill-conditioned data the objectives could be similar with 
> different solutions.
>
> It's not intended for scikit-learn to warn about ill-conditioned 
> problems, I think, only convergence issues.
>
> Hth,
> Andy
>
>
> On 1/8/20 3:31 PM, Benoît Presles wrote:
>> With lbfgs n_iter_ = 48, with saga n_iter_ = 326581, with liblinear 
>> n_iter_ = 64.
>>
>>
>> On 08/01/2020 21:18, Guillaume Lemaître wrote:
>>> We issue convergence warning. Can you check n_iter to be sure that 
>>> you did not convergence to the stated convergence?
>>>
>>> On Wed, 8 Jan 2020 at 20:53, Benoît Presles 
>>> <benoit.presles at u-bourgogne.fr 
>>> <mailto:benoit.presles at u-bourgogne.fr>> wrote:
>>>
>>>     Dear sklearn users,
>>>
>>>     I still have some issues concerning logistic regression.
>>>     I did compare on the same data (simulated data) sklearn with
>>>     three different solvers (lbfgs, saga, liblinear) and statsmodels.
>>>
>>>     When everything goes well, I get the same results between lbfgs,
>>>     saga, liblinear and statsmodels. When everything goes wrong, all
>>>     the results are different.
>>>
>>>     In fact, when everything goes wrong, statsmodels gives me a
>>>     convergence warning (Warning: Maximum number of iterations has
>>>     been exceeded. Current function value: inf Iterations: 20000) +
>>>     an error (numpy.linalg.LinAlgError: Singular matrix).
>>>
>>>     Why sklearn does not tell me anything? How can I know that I
>>>     have convergence issues with sklearn?
>>>
>>>
>>>     Thanks for your help,
>>>     Best regards,
>>>     Ben
>>>
>>>     --------------------------------------------
>>>
>>>     Here is the code I used to generate synthetic data:
>>>
>>>     from sklearn.datasets import make_classification
>>>     from sklearn.model_selection import StratifiedShuffleSplit
>>>     from sklearn.preprocessing import StandardScaler
>>>     from sklearn.linear_model import LogisticRegression
>>>     import statsmodels.api as sm
>>>     #
>>>     RANDOM_SEED = 2
>>>     #
>>>     X_sim, y_sim = make_classification(n_samples=200,
>>>                                n_features=20,
>>>                                n_informative=10,
>>>                                n_redundant=0,
>>>                                n_repeated=0,
>>>                                n_classes=2,
>>>                                n_clusters_per_class=1,
>>>                                random_state=RANDOM_SEED,
>>>                                shuffle=False)
>>>     #
>>>     sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2,
>>>     random_state=RANDOM_SEED)
>>>     for train_index_split, test_index_split in sss.split(X_sim, y_sim):
>>>         X_split_train, X_split_test = X_sim[train_index_split],
>>>     X_sim[test_index_split]
>>>         y_split_train, y_split_test = y_sim[train_index_split],
>>>     y_sim[test_index_split]
>>>         ss = StandardScaler()
>>>         X_split_train = ss.fit_transform(X_split_train)
>>>         X_split_test = ss.transform(X_split_test)
>>>         #
>>>         classifier_lbfgs = LogisticRegression(fit_intercept=True,
>>>     max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
>>>                                         solver='lbfgs',
>>>     penalty='none', tol=1e-6)
>>>         classifier_lbfgs.fit(X_split_train, y_split_train)
>>>         print('classifier lbfgs iter:', classifier_lbfgs.n_iter_)
>>>         print(classifier_lbfgs.intercept_)
>>>         print(classifier_lbfgs.coef_)
>>>         #
>>>         classifier_saga = LogisticRegression(fit_intercept=True,
>>>     max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
>>>                                         solver='saga',
>>>     penalty='none', tol=1e-6)
>>>         classifier_saga.fit(X_split_train, y_split_train)
>>>         print('classifier saga iter:', classifier_saga.n_iter_)
>>>         print(classifier_saga.intercept_)
>>>         print(classifier_saga.coef_)
>>>         #
>>>         classifier_liblinear =
>>>     LogisticRegression(fit_intercept=True, max_iter=20000000,
>>>     verbose=0, random_state=RANDOM_SEED,
>>>                                              C=1e9,
>>>     solver='liblinear', penalty='l2', tol=1e-6)
>>>         classifier_liblinear.fit(X_split_train, y_split_train)
>>>         print('classifier liblinear iter:',
>>>     classifier_liblinear.n_iter_)
>>>         print(classifier_liblinear.intercept_)
>>>         print(classifier_liblinear.coef_)
>>>         # statsmodels
>>>         logit = sm.Logit(y_split_train,
>>>     sm.tools.add_constant(X_split_train))
>>>         logit_res = logit.fit(maxiter=20000)
>>>         print("Coef statsmodels")
>>>         print(logit_res.params)
>>>
>>>
>>>
>>>     On 11/10/2019 15:42, Andreas Mueller wrote:
>>>>
>>>>
>>>>     On 10/10/19 1:14 PM, Benoît Presles wrote:
>>>>>
>>>>>     Thanks for your answers.
>>>>>
>>>>>     On my real data, I do not have so many samples. I have a bit
>>>>>     more than 200 samples in total and I also would like to get
>>>>>     some results with unpenalized logisitic regression.
>>>>>     What do you suggest? Should I switch to the lbfgs solver?
>>>>     Yes.
>>>>>     Am I sure that with this solver I will not have any
>>>>>     convergence issue and always get the good result? Indeed, I
>>>>>     did not get any convergence warning with saga, so I thought
>>>>>     everything was fine. I noticed some issues only when I decided
>>>>>     to test several solvers. Without comparing the results across
>>>>>     solvers, how to be sure that the optimisation goes well?
>>>>>     Shouldn't scikit-learn warn the user somehow if it is not the
>>>>>     case?
>>>>     We should attempt to warn in the SAGA solver if it doesn't
>>>>     converge. That it doesn't raise a convergence warning should
>>>>     probably be considered a bug.
>>>>     It uses the maximum weight change as a stopping criterion right
>>>>     now.
>>>>     We could probably compute the dual objective once in the end to
>>>>     see if we converged, right? Or is that not possible with SAGA?
>>>>     If not, we might want to caution that no convergence warning
>>>>     will be raised.
>>>>
>>>>>
>>>>>     At last, I was using saga because I also wanted to do some
>>>>>     feature selection by using l1 penalty which is not supported
>>>>>     by lbfgs...
>>>>     You can use liblinear then.
>>>>
>>>>
>>>>>
>>>>>     Best regards,
>>>>>     Ben
>>>>>
>>>>>
>>>>>     Le 09/10/2019 à 23:39, Guillaume Lemaître a écrit :
>>>>>>     Ups I did not see the answer of Roman. Sorry about that. It
>>>>>>     is coming back to the same conclusion :)
>>>>>>
>>>>>>     On Wed, 9 Oct 2019 at 23:37, Guillaume Lemaître
>>>>>>     <g.lemaitre58 at gmail.com <mailto:g.lemaitre58 at gmail.com>> wrote:
>>>>>>
>>>>>>         Uhm actually increasing to 10000 samples solve the
>>>>>>         convergence issue.
>>>>>>         SAGA is not designed to work with a so small sample size
>>>>>>         most probably.
>>>>>>
>>>>>>         On Wed, 9 Oct 2019 at 23:36, Guillaume Lemaître
>>>>>>         <g.lemaitre58 at gmail.com <mailto:g.lemaitre58 at gmail.com>>
>>>>>>         wrote:
>>>>>>
>>>>>>             I slightly change the bench such that it uses
>>>>>>             pipeline and plotted the coefficient:
>>>>>>
>>>>>>             https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
>>>>>>
>>>>>>             I only see one of the 10 splits where SAGA is not
>>>>>>             converging, otherwise the coefficients
>>>>>>             look very close (I don't attach the figure here but
>>>>>>             they can be plotted using the snippet).
>>>>>>             So apart from this second split, the other
>>>>>>             differences seems to be numerical instability.
>>>>>>
>>>>>>             Where I have some concern is regarding the
>>>>>>             convergence rate of SAGA but I have no
>>>>>>             intuition to know if this is normal or not.
>>>>>>
>>>>>>             On Wed, 9 Oct 2019 at 23:22, Roman Yurchak
>>>>>>             <rth.yurchak at gmail.com
>>>>>>             <mailto:rth.yurchak at gmail.com>> wrote:
>>>>>>
>>>>>>                 Ben,
>>>>>>
>>>>>>                 I can confirm your results with penalty='none'
>>>>>>                 and C=1e9. In both cases,
>>>>>>                 you are running a mostly unpenalized logisitic
>>>>>>                 regression. Usually
>>>>>>                 that's less numerically stable than with a small
>>>>>>                 regularization,
>>>>>>                 depending on the data collinearity.
>>>>>>
>>>>>>                 Running that same code with
>>>>>>                   - larger penalty ( smaller C values)
>>>>>>                   - or larger number of samples
>>>>>>                   yields for me the same coefficients (up to some
>>>>>>                 tolerance).
>>>>>>
>>>>>>                 You can also see that SAGA convergence is not
>>>>>>                 good by the fact that it
>>>>>>                 needs 196000 epochs/iterations to converge.
>>>>>>
>>>>>>                 Actually, I have often seen convergence issues
>>>>>>                 with SAG on small
>>>>>>                 datasets (in unit tests), not fully sure why.
>>>>>>
>>>>>>                 -- 
>>>>>>                 Roman
>>>>>>
>>>>>>                 On 09/10/2019 22:10, serafim loukas wrote:
>>>>>>                 > The predictions across solver are exactly the
>>>>>>                 same when I run the code.
>>>>>>                 > I am using 0.21.3 version. What is yours?
>>>>>>                 >
>>>>>>                 >
>>>>>>                 > In [13]: import sklearn
>>>>>>                 >
>>>>>>                 > In [14]: sklearn.__version__
>>>>>>                 > Out[14]: '0.21.3'
>>>>>>                 >
>>>>>>                 >
>>>>>>                 > Serafeim
>>>>>>                 >
>>>>>>                 >
>>>>>>                 >
>>>>>>                 >> On 9 Oct 2019, at 21:44, Benoît Presles
>>>>>>                 <benoit.presles at u-bourgogne.fr
>>>>>>                 <mailto:benoit.presles at u-bourgogne.fr>
>>>>>>                 >> <mailto:benoit.presles at u-bourgogne.fr
>>>>>>                 <mailto:benoit.presles at u-bourgogne.fr>>> wrote:
>>>>>>                 >>
>>>>>>                 >> (y_pred_lbfgs==y_pred_saga).all() == False
>>>>>>                 >
>>>>>>                 >
>>>>>>                 > _______________________________________________
>>>>>>                 > scikit-learn mailing list
>>>>>>                 > scikit-learn at python.org
>>>>>>                 <mailto:scikit-learn at python.org>
>>>>>>                 >
>>>>>>                 https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>                 >
>>>>>>
>>>>>>                 _______________________________________________
>>>>>>                 scikit-learn mailing list
>>>>>>                 scikit-learn at python.org
>>>>>>                 <mailto:scikit-learn at python.org>
>>>>>>                 https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>>>
>>>>>>
>>>>>>             -- 
>>>>>>             Guillaume Lemaitre
>>>>>>             Scikit-learn @ Inria Foundation
>>>>>>             https://glemaitre.github.io/
>>>>>>
>>>>>>
>>>>>>
>>>>>>         -- 
>>>>>>         Guillaume Lemaitre
>>>>>>         Scikit-learn @ Inria Foundation
>>>>>>         https://glemaitre.github.io/
>>>>>>
>>>>>>
>>>>>>
>>>>>>     -- 
>>>>>>     Guillaume Lemaitre
>>>>>>     Scikit-learn @ Inria Foundation
>>>>>>     https://glemaitre.github.io/
>>>>>>
>>>>>>     _______________________________________________
>>>>>>     scikit-learn mailing list
>>>>>>     scikit-learn at python.org  <mailto:scikit-learn at python.org>
>>>>>>     https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>>>     _______________________________________________
>>>>>     scikit-learn mailing list
>>>>>     scikit-learn at python.org  <mailto:scikit-learn at python.org>
>>>>>     https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>>
>>>>     _______________________________________________
>>>>     scikit-learn mailing list
>>>>     scikit-learn at python.org  <mailto:scikit-learn at python.org>
>>>>     https://mail.python.org/mailman/listinfo/scikit-learn
>>>     _______________________________________________
>>>     scikit-learn mailing list
>>>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>>>     https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>>
>>> -- 
>>> Guillaume Lemaitre
>>> Scikit-learn @ Inria Foundation
>>> https://glemaitre.github.io/
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200109/3a2f21bb/attachment-0001.html>