[scikit-learn] logistic regression results are not stable between solvers

Wed Jan 8 15:31:47 EST 2020

With lbfgs n_iter_ = 48, with saga n_iter_ = 326581, with liblinear 
n_iter_ = 64.


On 08/01/2020 21:18, Guillaume Lemaître wrote:
> We issue convergence warning. Can you check n_iter to be sure that you 
> did not convergence to the stated convergence?
>
> On Wed, 8 Jan 2020 at 20:53, Benoît Presles 
> <benoit.presles at u-bourgogne.fr <mailto:benoit.presles at u-bourgogne.fr>> 
> wrote:
>
>     Dear sklearn users,
>
>     I still have some issues concerning logistic regression.
>     I did compare on the same data (simulated data) sklearn with three
>     different solvers (lbfgs, saga, liblinear) and statsmodels.
>
>     When everything goes well, I get the same results between lbfgs,
>     saga, liblinear and statsmodels. When everything goes wrong, all
>     the results are different.
>
>     In fact, when everything goes wrong, statsmodels gives me a
>     convergence warning (Warning: Maximum number of iterations has
>     been exceeded. Current function value: inf Iterations: 20000) + an
>     error (numpy.linalg.LinAlgError: Singular matrix).
>
>     Why sklearn does not tell me anything? How can I know that I have
>     convergence issues with sklearn?
>
>
>     Thanks for your help,
>     Best regards,
>     Ben
>
>     --------------------------------------------
>
>     Here is the code I used to generate synthetic data:
>
>     from sklearn.datasets import make_classification
>     from sklearn.model_selection import StratifiedShuffleSplit
>     from sklearn.preprocessing import StandardScaler
>     from sklearn.linear_model import LogisticRegression
>     import statsmodels.api as sm
>     #
>     RANDOM_SEED = 2
>     #
>     X_sim, y_sim = make_classification(n_samples=200,
>                                n_features=20,
>                                n_informative=10,
>                                n_redundant=0,
>                                n_repeated=0,
>                                n_classes=2,
>                                n_clusters_per_class=1,
>                                random_state=RANDOM_SEED,
>                                shuffle=False)
>     #
>     sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2,
>     random_state=RANDOM_SEED)
>     for train_index_split, test_index_split in sss.split(X_sim, y_sim):
>         X_split_train, X_split_test = X_sim[train_index_split],
>     X_sim[test_index_split]
>         y_split_train, y_split_test = y_sim[train_index_split],
>     y_sim[test_index_split]
>         ss = StandardScaler()
>         X_split_train = ss.fit_transform(X_split_train)
>         X_split_test = ss.transform(X_split_test)
>         #
>         classifier_lbfgs = LogisticRegression(fit_intercept=True,
>     max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
>                                         solver='lbfgs',
>     penalty='none', tol=1e-6)
>         classifier_lbfgs.fit(X_split_train, y_split_train)
>         print('classifier lbfgs iter:', classifier_lbfgs.n_iter_)
>         print(classifier_lbfgs.intercept_)
>         print(classifier_lbfgs.coef_)
>         #
>         classifier_saga = LogisticRegression(fit_intercept=True,
>     max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
>                                         solver='saga', penalty='none',
>     tol=1e-6)
>         classifier_saga.fit(X_split_train, y_split_train)
>         print('classifier saga iter:', classifier_saga.n_iter_)
>         print(classifier_saga.intercept_)
>         print(classifier_saga.coef_)
>         #
>         classifier_liblinear = LogisticRegression(fit_intercept=True,
>     max_iter=20000000, verbose=0, random_state=RANDOM_SEED,
>                                              C=1e9,
>     solver='liblinear', penalty='l2', tol=1e-6)
>         classifier_liblinear.fit(X_split_train, y_split_train)
>         print('classifier liblinear iter:', classifier_liblinear.n_iter_)
>         print(classifier_liblinear.intercept_)
>         print(classifier_liblinear.coef_)
>         # statsmodels
>         logit = sm.Logit(y_split_train,
>     sm.tools.add_constant(X_split_train))
>         logit_res = logit.fit(maxiter=20000)
>         print("Coef statsmodels")
>         print(logit_res.params)
>
>
>
>     On 11/10/2019 15:42, Andreas Mueller wrote:
>>
>>
>>     On 10/10/19 1:14 PM, Benoît Presles wrote:
>>>
>>>     Thanks for your answers.
>>>
>>>     On my real data, I do not have so many samples. I have a bit
>>>     more than 200 samples in total and I also would like to get some
>>>     results with unpenalized logisitic regression.
>>>     What do you suggest? Should I switch to the lbfgs solver?
>>     Yes.
>>>     Am I sure that with this solver I will not have any convergence
>>>     issue and always get the good result? Indeed, I did not get any
>>>     convergence warning with saga, so I thought everything was fine.
>>>     I noticed some issues only when I decided to test several
>>>     solvers. Without comparing the results across solvers, how to be
>>>     sure that the optimisation goes well? Shouldn't scikit-learn
>>>     warn the user somehow if it is not the case?
>>     We should attempt to warn in the SAGA solver if it doesn't
>>     converge. That it doesn't raise a convergence warning should
>>     probably be considered a bug.
>>     It uses the maximum weight change as a stopping criterion right now.
>>     We could probably compute the dual objective once in the end to
>>     see if we converged, right? Or is that not possible with SAGA? If
>>     not, we might want to caution that no convergence warning will be
>>     raised.
>>
>>>
>>>     At last, I was using saga because I also wanted to do some
>>>     feature selection by using l1 penalty which is not supported by
>>>     lbfgs...
>>     You can use liblinear then.
>>
>>
>>>
>>>     Best regards,
>>>     Ben
>>>
>>>
>>>     Le 09/10/2019 à 23:39, Guillaume Lemaître a écrit :
>>>>     Ups I did not see the answer of Roman. Sorry about that. It is
>>>>     coming back to the same conclusion :)
>>>>
>>>>     On Wed, 9 Oct 2019 at 23:37, Guillaume Lemaître
>>>>     <g.lemaitre58 at gmail.com <mailto:g.lemaitre58 at gmail.com>> wrote:
>>>>
>>>>         Uhm actually increasing to 10000 samples solve the
>>>>         convergence issue.
>>>>         SAGA is not designed to work with a so small sample size
>>>>         most probably.
>>>>
>>>>         On Wed, 9 Oct 2019 at 23:36, Guillaume Lemaître
>>>>         <g.lemaitre58 at gmail.com <mailto:g.lemaitre58 at gmail.com>> wrote:
>>>>
>>>>             I slightly change the bench such that it uses pipeline
>>>>             and plotted the coefficient:
>>>>
>>>>             https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
>>>>
>>>>             I only see one of the 10 splits where SAGA is not
>>>>             converging, otherwise the coefficients
>>>>             look very close (I don't attach the figure here but
>>>>             they can be plotted using the snippet).
>>>>             So apart from this second split, the other differences
>>>>             seems to be numerical instability.
>>>>
>>>>             Where I have some concern is regarding the convergence
>>>>             rate of SAGA but I have no
>>>>             intuition to know if this is normal or not.
>>>>
>>>>             On Wed, 9 Oct 2019 at 23:22, Roman Yurchak
>>>>             <rth.yurchak at gmail.com <mailto:rth.yurchak at gmail.com>>
>>>>             wrote:
>>>>
>>>>                 Ben,
>>>>
>>>>                 I can confirm your results with penalty='none' and
>>>>                 C=1e9. In both cases,
>>>>                 you are running a mostly unpenalized logisitic
>>>>                 regression. Usually
>>>>                 that's less numerically stable than with a small
>>>>                 regularization,
>>>>                 depending on the data collinearity.
>>>>
>>>>                 Running that same code with
>>>>                   - larger penalty ( smaller C values)
>>>>                   - or larger number of samples
>>>>                   yields for me the same coefficients (up to some
>>>>                 tolerance).
>>>>
>>>>                 You can also see that SAGA convergence is not good
>>>>                 by the fact that it
>>>>                 needs 196000 epochs/iterations to converge.
>>>>
>>>>                 Actually, I have often seen convergence issues with
>>>>                 SAG on small
>>>>                 datasets (in unit tests), not fully sure why.
>>>>
>>>>                 -- 
>>>>                 Roman
>>>>
>>>>                 On 09/10/2019 22:10, serafim loukas wrote:
>>>>                 > The predictions across solver are exactly the
>>>>                 same when I run the code.
>>>>                 > I am using 0.21.3 version. What is yours?
>>>>                 >
>>>>                 >
>>>>                 > In [13]: import sklearn
>>>>                 >
>>>>                 > In [14]: sklearn.__version__
>>>>                 > Out[14]: '0.21.3'
>>>>                 >
>>>>                 >
>>>>                 > Serafeim
>>>>                 >
>>>>                 >
>>>>                 >
>>>>                 >> On 9 Oct 2019, at 21:44, Benoît Presles
>>>>                 <benoit.presles at u-bourgogne.fr
>>>>                 <mailto:benoit.presles at u-bourgogne.fr>
>>>>                 >> <mailto:benoit.presles at u-bourgogne.fr
>>>>                 <mailto:benoit.presles at u-bourgogne.fr>>> wrote:
>>>>                 >>
>>>>                 >> (y_pred_lbfgs==y_pred_saga).all() == False
>>>>                 >
>>>>                 >
>>>>                 > _______________________________________________
>>>>                 > scikit-learn mailing list
>>>>                 > scikit-learn at python.org
>>>>                 <mailto:scikit-learn at python.org>
>>>>                 > https://mail.python.org/mailman/listinfo/scikit-learn
>>>>                 >
>>>>
>>>>                 _______________________________________________
>>>>                 scikit-learn mailing list
>>>>                 scikit-learn at python.org
>>>>                 <mailto:scikit-learn at python.org>
>>>>                 https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>>
>>>>
>>>>             -- 
>>>>             Guillaume Lemaitre
>>>>             Scikit-learn @ Inria Foundation
>>>>             https://glemaitre.github.io/
>>>>
>>>>
>>>>
>>>>         -- 
>>>>         Guillaume Lemaitre
>>>>         Scikit-learn @ Inria Foundation
>>>>         https://glemaitre.github.io/
>>>>
>>>>
>>>>
>>>>     -- 
>>>>     Guillaume Lemaitre
>>>>     Scikit-learn @ Inria Foundation
>>>>     https://glemaitre.github.io/
>>>>
>>>>     _______________________________________________
>>>>     scikit-learn mailing list
>>>>     scikit-learn at python.org  <mailto:scikit-learn at python.org>
>>>>     https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>     _______________________________________________
>>>     scikit-learn mailing list
>>>     scikit-learn at python.org  <mailto:scikit-learn at python.org>
>>>     https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>>     _______________________________________________
>>     scikit-learn mailing list
>>     scikit-learn at python.org  <mailto:scikit-learn at python.org>
>>     https://mail.python.org/mailman/listinfo/scikit-learn
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> -- 
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200108/532e9521/attachment-0001.html>