[scikit-learn] logistic regression results are not stable between solvers

Wed Jan 8 14:45:59 EST 2020

Dear sklearn users,

I still have some issues concerning logistic regression.
I did compare on the same data (simulated data) sklearn with three 
different solvers (lbfgs, saga, liblinear) and statsmodels.

When everything goes well, I get the same results between lbfgs, saga, 
liblinear and statsmodels. When everything goes wrong, all the results 
are different.

In fact, when everything goes wrong, statsmodels gives me a convergence 
warning (Warning: Maximum number of iterations has been exceeded. 
Current function value: inf Iterations: 20000) + an error 
(numpy.linalg.LinAlgError: Singular matrix).

Why sklearn does not tell me anything? How can I know that I have 
convergence issues with sklearn?

Thanks for your help,
Best regards,
Ben

--------------------------------------------

Here is the code I used to generate synthetic data:

from sklearn.datasets import make_classification
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import statsmodels.api as sm
#
RANDOM_SEED = 2
#
X_sim, y_sim = make_classification(n_samples=200,
                            n_features=20,
                            n_informative=10,
                            n_redundant=0,
                            n_repeated=0,
                            n_classes=2,
                            n_clusters_per_class=1,
                            random_state=RANDOM_SEED,
                            shuffle=False)
#
sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2, 
random_state=RANDOM_SEED)
for train_index_split, test_index_split in sss.split(X_sim, y_sim):
     X_split_train, X_split_test = X_sim[train_index_split], 
X_sim[test_index_split]
     y_split_train, y_split_test = y_sim[train_index_split], 
y_sim[test_index_split]
     ss = StandardScaler()
     X_split_train = ss.fit_transform(X_split_train)
     X_split_test = ss.transform(X_split_test)
     #
     classifier_lbfgs = LogisticRegression(fit_intercept=True, 
max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
                                     solver='lbfgs', penalty='none', 
tol=1e-6)
     classifier_lbfgs.fit(X_split_train, y_split_train)
     print('classifier lbfgs iter:',  classifier_lbfgs.n_iter_)
     print(classifier_lbfgs.intercept_)
     print(classifier_lbfgs.coef_)
     #
     classifier_saga = LogisticRegression(fit_intercept=True, 
max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
                                     solver='saga', penalty='none', 
tol=1e-6)
     classifier_saga.fit(X_split_train, y_split_train)
     print('classifier saga iter:', classifier_saga.n_iter_)
     print(classifier_saga.intercept_)
     print(classifier_saga.coef_)
     #
     classifier_liblinear = LogisticRegression(fit_intercept=True, 
max_iter=20000000, verbose=0, random_state=RANDOM_SEED,
                                          C=1e9,
                                          solver='liblinear', 
penalty='l2', tol=1e-6)
     classifier_liblinear.fit(X_split_train, y_split_train)
     print('classifier liblinear iter:', classifier_liblinear.n_iter_)
     print(classifier_liblinear.intercept_)
     print(classifier_liblinear.coef_)
     # statsmodels
     logit = sm.Logit(y_split_train, sm.tools.add_constant(X_split_train))
     logit_res = logit.fit(maxiter=20000)
     print("Coef statsmodels")
     print(logit_res.params)

On 11/10/2019 15:42, Andreas Mueller wrote:
>
>
> On 10/10/19 1:14 PM, Benoît Presles wrote:
>>
>> Thanks for your answers.
>>
>> On my real data, I do not have so many samples. I have a bit more 
>> than 200 samples in total and I also would like to get some results 
>> with unpenalized logisitic regression.
>> What do you suggest? Should I switch to the lbfgs solver?
> Yes.
>> Am I sure that with this solver I will not have any convergence issue 
>> and always get the good result? Indeed, I did not get any convergence 
>> warning with saga, so I thought everything was fine. I noticed some 
>> issues only when I decided to test several solvers. Without comparing 
>> the results across solvers, how to be sure that the optimisation goes 
>> well? Shouldn't scikit-learn warn the user somehow if it is not the case?
> We should attempt to warn in the SAGA solver if it doesn't converge. 
> That it doesn't raise a convergence warning should probably be 
> considered a bug.
> It uses the maximum weight change as a stopping criterion right now.
> We could probably compute the dual objective once in the end to see if 
> we converged, right? Or is that not possible with SAGA? If not, we 
> might want to caution that no convergence warning will be raised.
>
>>
>> At last, I was using saga because I also wanted to do some feature 
>> selection by using l1 penalty which is not supported by lbfgs...
> You can use liblinear then.
>
>
>>
>> Best regards,
>> Ben
>>
>>
>> Le 09/10/2019 à 23:39, Guillaume Lemaître a écrit :
>>> Ups I did not see the answer of Roman. Sorry about that. It is 
>>> coming back to the same conclusion :)
>>>
>>> On Wed, 9 Oct 2019 at 23:37, Guillaume Lemaître 
>>> <g.lemaitre58 at gmail.com <mailto:g.lemaitre58 at gmail.com>> wrote:
>>>
>>>     Uhm actually increasing to 10000 samples solve the convergence
>>>     issue.
>>>     SAGA is not designed to work with a so small sample size most
>>>     probably.
>>>
>>>     On Wed, 9 Oct 2019 at 23:36, Guillaume Lemaître
>>>     <g.lemaitre58 at gmail.com <mailto:g.lemaitre58 at gmail.com>> wrote:
>>>
>>>         I slightly change the bench such that it uses pipeline and
>>>         plotted the coefficient:
>>>
>>>         https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
>>>
>>>         I only see one of the 10 splits where SAGA is not
>>>         converging, otherwise the coefficients
>>>         look very close (I don't attach the figure here but they can
>>>         be plotted using the snippet).
>>>         So apart from this second split, the other differences seems
>>>         to be numerical instability.
>>>
>>>         Where I have some concern is regarding the convergence rate
>>>         of SAGA but I have no
>>>         intuition to know if this is normal or not.
>>>
>>>         On Wed, 9 Oct 2019 at 23:22, Roman Yurchak
>>>         <rth.yurchak at gmail.com <mailto:rth.yurchak at gmail.com>> wrote:
>>>
>>>             Ben,
>>>
>>>             I can confirm your results with penalty='none' and
>>>             C=1e9. In both cases,
>>>             you are running a mostly unpenalized logisitic
>>>             regression. Usually
>>>             that's less numerically stable than with a small
>>>             regularization,
>>>             depending on the data collinearity.
>>>
>>>             Running that same code with
>>>               - larger penalty ( smaller C values)
>>>               - or larger number of samples
>>>               yields for me the same coefficients (up to some
>>>             tolerance).
>>>
>>>             You can also see that SAGA convergence is not good by
>>>             the fact that it
>>>             needs 196000 epochs/iterations to converge.
>>>
>>>             Actually, I have often seen convergence issues with SAG
>>>             on small
>>>             datasets (in unit tests), not fully sure why.
>>>
>>>             -- 
>>>             Roman
>>>
>>>             On 09/10/2019 22:10, serafim loukas wrote:
>>>             > The predictions across solver are exactly the same
>>>             when I run the code.
>>>             > I am using 0.21.3 version. What is yours?
>>>             >
>>>             >
>>>             > In [13]: import sklearn
>>>             >
>>>             > In [14]: sklearn.__version__
>>>             > Out[14]: '0.21.3'
>>>             >
>>>             >
>>>             > Serafeim
>>>             >
>>>             >
>>>             >
>>>             >> On 9 Oct 2019, at 21:44, Benoît Presles
>>>             <benoit.presles at u-bourgogne.fr
>>>             <mailto:benoit.presles at u-bourgogne.fr>
>>>             >> <mailto:benoit.presles at u-bourgogne.fr
>>>             <mailto:benoit.presles at u-bourgogne.fr>>> wrote:
>>>             >>
>>>             >> (y_pred_lbfgs==y_pred_saga).all() == False
>>>             >
>>>             >
>>>             > _______________________________________________
>>>             > scikit-learn mailing list
>>>             > scikit-learn at python.org <mailto:scikit-learn at python.org>
>>>             > https://mail.python.org/mailman/listinfo/scikit-learn
>>>             >
>>>
>>>             _______________________________________________
>>>             scikit-learn mailing list
>>>             scikit-learn at python.org <mailto:scikit-learn at python.org>
>>>             https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>>
>>>         -- 
>>>         Guillaume Lemaitre
>>>         Scikit-learn @ Inria Foundation
>>>         https://glemaitre.github.io/
>>>
>>>
>>>
>>>     -- 
>>>     Guillaume Lemaitre
>>>     Scikit-learn @ Inria Foundation
>>>     https://glemaitre.github.io/
>>>
>>>
>>>
>>> -- 
>>> Guillaume Lemaitre
>>> Scikit-learn @ Inria Foundation
>>> https://glemaitre.github.io/
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200108/a35678d0/attachment-0001.html>