[scikit-learn] RFE with logistic regression
Benoît Presles
benoit.presles at u-bourgogne.fr
Wed Jul 25 06:36:55 EDT 2018
Do you think the problems I have can come from correlated features?
Indeed, in my dataset I have some highly correlated features.
Do you think this could explain why I don't get reproducible and
consistent results?
Thanks for your help,
Ben
Le 24/07/2018 à 23:44, bthirion a écrit :
> Univariate screening is somewhat hackish too, but much more stable --
> and cheap.
> Best,
>
> Bertrand
>
> On 24/07/2018 23:33, Benoît Presles wrote:
>> So you think that I cannot get reproducible and consistent results
>> with this method ?
>> If you would avoid RFE, which method do you suggest to find the best
>> features ?
>>
>> Ben
>>
>>
>> Le 24/07/2018 à 21:34, Gael Varoquaux a écrit :
>>> On Tue, Jul 24, 2018 at 08:43:27PM +0200, Benoît Presles wrote:
>>>> 3. With C=1, it seems that I have the same results at each run for all
>>>> solvers (liblinear, sag and saga), however the ranking is not the same
>>>> between the solvers.
>>> Your problem is probably ill-conditioned, hence the specific weights on
>>> the features are not stable. There isn't a good answer to ordering
>>> features, they are degenerate.
>>>
>>> In general, I would avoid RFE, it is a hack, and can easily lead to
>>> these
>>> problems.
>>>
>>> Gaël
>>>
>>>> Thanks for your help,
>>>> Ben
>>>
>>>> PS1: I checked and n_iter_ seems to be always lower than max_iter.
>>>> PS2: my data is scaled, I am using "StandardScaler".
>>>
>>>
>>>> Le 24/07/2018 à 20:33, Andreas Mueller a écrit :
>>>
>>>>> On 07/24/2018 02:07 PM, Benoît Presles wrote:
>>>>>> I did the same tests as before adding fit_intercept=False and:
>>>>>> 1. I have got the same problem as before, i.e. when I execute the
>>>>>> RFE multiple times I don't get the same ranking each time.
>>>>>> 2. When I change the solver to 'sag'
>>>>>> (classifier_RFE=LogisticRegression(C=1e9, verbose=1, max_iter=10000,
>>>>>> fit_intercept=False, solver='sag')), it seems that I get the same
>>>>>> ranking at each run. This is not the case with the 'saga' solver.
>>>>>> The ranking is not the same between the solvers.
>>>>>> 3. With C=1, it seems that I have the same results at each run for
>>>>>> all solvers (liblinear, sag and saga), however the ranking is not
>>>>>> the same between the solvers.
>>>
>>>>>> How can I get reproducible and consistent results?
>>>>> Did you scale your data? If not, saga and sag will basically fail.
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
More information about the scikit-learn
mailing list