[scikit-learn] logistic regression results are not stable between solvers
Benoît Presles
benoit.presles at u-bourgogne.fr
Wed Jan 8 15:31:47 EST 2020
With lbfgs n_iter_ = 48, with saga n_iter_ = 326581, with liblinear
n_iter_ = 64.
On 08/01/2020 21:18, Guillaume Lemaître wrote:
> We issue convergence warning. Can you check n_iter to be sure that you
> did not convergence to the stated convergence?
>
> On Wed, 8 Jan 2020 at 20:53, Benoît Presles
> <benoit.presles at u-bourgogne.fr <mailto:benoit.presles at u-bourgogne.fr>>
> wrote:
>
> Dear sklearn users,
>
> I still have some issues concerning logistic regression.
> I did compare on the same data (simulated data) sklearn with three
> different solvers (lbfgs, saga, liblinear) and statsmodels.
>
> When everything goes well, I get the same results between lbfgs,
> saga, liblinear and statsmodels. When everything goes wrong, all
> the results are different.
>
> In fact, when everything goes wrong, statsmodels gives me a
> convergence warning (Warning: Maximum number of iterations has
> been exceeded. Current function value: inf Iterations: 20000) + an
> error (numpy.linalg.LinAlgError: Singular matrix).
>
> Why sklearn does not tell me anything? How can I know that I have
> convergence issues with sklearn?
>
>
> Thanks for your help,
> Best regards,
> Ben
>
> --------------------------------------------
>
> Here is the code I used to generate synthetic data:
>
> from sklearn.datasets import make_classification
> from sklearn.model_selection import StratifiedShuffleSplit
> from sklearn.preprocessing import StandardScaler
> from sklearn.linear_model import LogisticRegression
> import statsmodels.api as sm
> #
> RANDOM_SEED = 2
> #
> X_sim, y_sim = make_classification(n_samples=200,
> n_features=20,
> n_informative=10,
> n_redundant=0,
> n_repeated=0,
> n_classes=2,
> n_clusters_per_class=1,
> random_state=RANDOM_SEED,
> shuffle=False)
> #
> sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2,
> random_state=RANDOM_SEED)
> for train_index_split, test_index_split in sss.split(X_sim, y_sim):
> X_split_train, X_split_test = X_sim[train_index_split],
> X_sim[test_index_split]
> y_split_train, y_split_test = y_sim[train_index_split],
> y_sim[test_index_split]
> ss = StandardScaler()
> X_split_train = ss.fit_transform(X_split_train)
> X_split_test = ss.transform(X_split_test)
> #
> classifier_lbfgs = LogisticRegression(fit_intercept=True,
> max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
> solver='lbfgs',
> penalty='none', tol=1e-6)
> classifier_lbfgs.fit(X_split_train, y_split_train)
> print('classifier lbfgs iter:', classifier_lbfgs.n_iter_)
> print(classifier_lbfgs.intercept_)
> print(classifier_lbfgs.coef_)
> #
> classifier_saga = LogisticRegression(fit_intercept=True,
> max_iter=20000000, verbose=0, random_state=RANDOM_SEED, C=1e9,
> solver='saga', penalty='none',
> tol=1e-6)
> classifier_saga.fit(X_split_train, y_split_train)
> print('classifier saga iter:', classifier_saga.n_iter_)
> print(classifier_saga.intercept_)
> print(classifier_saga.coef_)
> #
> classifier_liblinear = LogisticRegression(fit_intercept=True,
> max_iter=20000000, verbose=0, random_state=RANDOM_SEED,
> C=1e9,
> solver='liblinear', penalty='l2', tol=1e-6)
> classifier_liblinear.fit(X_split_train, y_split_train)
> print('classifier liblinear iter:', classifier_liblinear.n_iter_)
> print(classifier_liblinear.intercept_)
> print(classifier_liblinear.coef_)
> # statsmodels
> logit = sm.Logit(y_split_train,
> sm.tools.add_constant(X_split_train))
> logit_res = logit.fit(maxiter=20000)
> print("Coef statsmodels")
> print(logit_res.params)
>
>
>
> On 11/10/2019 15:42, Andreas Mueller wrote:
>>
>>
>> On 10/10/19 1:14 PM, Benoît Presles wrote:
>>>
>>> Thanks for your answers.
>>>
>>> On my real data, I do not have so many samples. I have a bit
>>> more than 200 samples in total and I also would like to get some
>>> results with unpenalized logisitic regression.
>>> What do you suggest? Should I switch to the lbfgs solver?
>> Yes.
>>> Am I sure that with this solver I will not have any convergence
>>> issue and always get the good result? Indeed, I did not get any
>>> convergence warning with saga, so I thought everything was fine.
>>> I noticed some issues only when I decided to test several
>>> solvers. Without comparing the results across solvers, how to be
>>> sure that the optimisation goes well? Shouldn't scikit-learn
>>> warn the user somehow if it is not the case?
>> We should attempt to warn in the SAGA solver if it doesn't
>> converge. That it doesn't raise a convergence warning should
>> probably be considered a bug.
>> It uses the maximum weight change as a stopping criterion right now.
>> We could probably compute the dual objective once in the end to
>> see if we converged, right? Or is that not possible with SAGA? If
>> not, we might want to caution that no convergence warning will be
>> raised.
>>
>>>
>>> At last, I was using saga because I also wanted to do some
>>> feature selection by using l1 penalty which is not supported by
>>> lbfgs...
>> You can use liblinear then.
>>
>>
>>>
>>> Best regards,
>>> Ben
>>>
>>>
>>> Le 09/10/2019 à 23:39, Guillaume Lemaître a écrit :
>>>> Ups I did not see the answer of Roman. Sorry about that. It is
>>>> coming back to the same conclusion :)
>>>>
>>>> On Wed, 9 Oct 2019 at 23:37, Guillaume Lemaître
>>>> <g.lemaitre58 at gmail.com <mailto:g.lemaitre58 at gmail.com>> wrote:
>>>>
>>>> Uhm actually increasing to 10000 samples solve the
>>>> convergence issue.
>>>> SAGA is not designed to work with a so small sample size
>>>> most probably.
>>>>
>>>> On Wed, 9 Oct 2019 at 23:36, Guillaume Lemaître
>>>> <g.lemaitre58 at gmail.com <mailto:g.lemaitre58 at gmail.com>> wrote:
>>>>
>>>> I slightly change the bench such that it uses pipeline
>>>> and plotted the coefficient:
>>>>
>>>> https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
>>>>
>>>> I only see one of the 10 splits where SAGA is not
>>>> converging, otherwise the coefficients
>>>> look very close (I don't attach the figure here but
>>>> they can be plotted using the snippet).
>>>> So apart from this second split, the other differences
>>>> seems to be numerical instability.
>>>>
>>>> Where I have some concern is regarding the convergence
>>>> rate of SAGA but I have no
>>>> intuition to know if this is normal or not.
>>>>
>>>> On Wed, 9 Oct 2019 at 23:22, Roman Yurchak
>>>> <rth.yurchak at gmail.com <mailto:rth.yurchak at gmail.com>>
>>>> wrote:
>>>>
>>>> Ben,
>>>>
>>>> I can confirm your results with penalty='none' and
>>>> C=1e9. In both cases,
>>>> you are running a mostly unpenalized logisitic
>>>> regression. Usually
>>>> that's less numerically stable than with a small
>>>> regularization,
>>>> depending on the data collinearity.
>>>>
>>>> Running that same code with
>>>> - larger penalty ( smaller C values)
>>>> - or larger number of samples
>>>> yields for me the same coefficients (up to some
>>>> tolerance).
>>>>
>>>> You can also see that SAGA convergence is not good
>>>> by the fact that it
>>>> needs 196000 epochs/iterations to converge.
>>>>
>>>> Actually, I have often seen convergence issues with
>>>> SAG on small
>>>> datasets (in unit tests), not fully sure why.
>>>>
>>>> --
>>>> Roman
>>>>
>>>> On 09/10/2019 22:10, serafim loukas wrote:
>>>> > The predictions across solver are exactly the
>>>> same when I run the code.
>>>> > I am using 0.21.3 version. What is yours?
>>>> >
>>>> >
>>>> > In [13]: import sklearn
>>>> >
>>>> > In [14]: sklearn.__version__
>>>> > Out[14]: '0.21.3'
>>>> >
>>>> >
>>>> > Serafeim
>>>> >
>>>> >
>>>> >
>>>> >> On 9 Oct 2019, at 21:44, Benoît Presles
>>>> <benoit.presles at u-bourgogne.fr
>>>> <mailto:benoit.presles at u-bourgogne.fr>
>>>> >> <mailto:benoit.presles at u-bourgogne.fr
>>>> <mailto:benoit.presles at u-bourgogne.fr>>> wrote:
>>>> >>
>>>> >> (y_pred_lbfgs==y_pred_saga).all() == False
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > scikit-learn mailing list
>>>> > scikit-learn at python.org
>>>> <mailto:scikit-learn at python.org>
>>>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>>> >
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> <mailto:scikit-learn at python.org>
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>>
>>>>
>>>> --
>>>> Guillaume Lemaitre
>>>> Scikit-learn @ Inria Foundation
>>>> https://glemaitre.github.io/
>>>>
>>>>
>>>>
>>>> --
>>>> Guillaume Lemaitre
>>>> Scikit-learn @ Inria Foundation
>>>> https://glemaitre.github.io/
>>>>
>>>>
>>>>
>>>> --
>>>> Guillaume Lemaitre
>>>> Scikit-learn @ Inria Foundation
>>>> https://glemaitre.github.io/
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org <mailto:scikit-learn at python.org>
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200108/532e9521/attachment-0001.html>
More information about the scikit-learn
mailing list