[scikit-learn] logistic regression results are not stable between solvers

Guillaume Lemaître g.lemaitre58 at gmail.com
Wed Oct 9 17:39:05 EDT 2019


Ups I did not see the answer of Roman. Sorry about that. It is coming back
to the same conclusion :)

On Wed, 9 Oct 2019 at 23:37, Guillaume Lemaître <g.lemaitre58 at gmail.com>
wrote:

> Uhm actually increasing to 10000 samples solve the convergence issue.
> SAGA is not designed to work with a so small sample size most probably.
>
> On Wed, 9 Oct 2019 at 23:36, Guillaume Lemaître <g.lemaitre58 at gmail.com>
> wrote:
>
>> I slightly change the bench such that it uses pipeline and plotted the
>> coefficient:
>>
>> https://gist.github.com/glemaitre/8fcc24bdfc7dc38ca0c09c56e26b9386
>>
>> I only see one of the 10 splits where SAGA is not converging, otherwise
>> the coefficients
>> look very close (I don't attach the figure here but they can be plotted
>> using the snippet).
>> So apart from this second split, the other differences seems to be
>> numerical instability.
>>
>> Where I have some concern is regarding the convergence rate of SAGA but I
>> have no
>> intuition to know if this is normal or not.
>>
>> On Wed, 9 Oct 2019 at 23:22, Roman Yurchak <rth.yurchak at gmail.com> wrote:
>>
>>> Ben,
>>>
>>> I can confirm your results with penalty='none' and C=1e9. In both cases,
>>> you are running a mostly unpenalized logisitic regression. Usually
>>> that's less numerically stable than with a small regularization,
>>> depending on the data collinearity.
>>>
>>> Running that same code with
>>>   - larger penalty ( smaller C values)
>>>   - or larger number of samples
>>>   yields for me the same coefficients (up to some tolerance).
>>>
>>> You can also see that SAGA convergence is not good by the fact that it
>>> needs 196000 epochs/iterations to converge.
>>>
>>> Actually, I have often seen convergence issues with SAG on small
>>> datasets (in unit tests), not fully sure why.
>>>
>>> --
>>> Roman
>>>
>>> On 09/10/2019 22:10, serafim loukas wrote:
>>> > The predictions across solver are exactly the same when I run the code.
>>> > I am using 0.21.3 version. What is yours?
>>> >
>>> >
>>> > In [13]: import sklearn
>>> >
>>> > In [14]: sklearn.__version__
>>> > Out[14]: '0.21.3'
>>> >
>>> >
>>> > Serafeim
>>> >
>>> >
>>> >
>>> >> On 9 Oct 2019, at 21:44, Benoît Presles <
>>> benoit.presles at u-bourgogne.fr
>>> >> <mailto:benoit.presles at u-bourgogne.fr>> wrote:
>>> >>
>>> >> (y_pred_lbfgs==y_pred_saga).all() == False
>>> >
>>> >
>>> > _______________________________________________
>>> > scikit-learn mailing list
>>> > scikit-learn at python.org
>>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>> >
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>
>>
>> --
>> Guillaume Lemaitre
>> Scikit-learn @ Inria Foundation
>> https://glemaitre.github.io/
>>
>
>
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20191009/c54f0bd7/attachment.html>


More information about the scikit-learn mailing list