[scikit-learn] RFE with logistic regression

Wed Jul 25 07:50:04 EDT 2018

On Wed, Jul 25, 2018 at 12:36:55PM +0200, Benoît Presles wrote:
> Do you think the problems I have can come from correlated features? Indeed,
> in my dataset I have some highly correlated features.

Yes, in general selecting features conditionally on others is very hard
when features are highly correlated.

> Do you think this could explain why I don't get reproducible and consistent
> results?

Yes.

> Thanks for your help,
> Ben

> Le 24/07/2018 à 23:44, bthirion a écrit :
> > Univariate screening is somewhat hackish too, but much more stable --
> > and cheap.
> > Best,

> > Bertrand

> > On 24/07/2018 23:33, Benoît Presles wrote:
> > > So you think that I cannot get reproducible and consistent results
> > > with this method ?
> > > If you would avoid RFE, which method do you suggest to find the best
> > > features ?

> > > Ben

> > > Le 24/07/2018 à 21:34, Gael Varoquaux a écrit :
> > > > On Tue, Jul 24, 2018 at 08:43:27PM +0200, Benoît Presles wrote:
> > > > > 3. With C=1, it seems that I have the same results at each run for all
> > > > > solvers (liblinear, sag and saga), however the ranking is not the same
> > > > > between the solvers.
> > > > Your problem is probably ill-conditioned, hence the specific weights on
> > > > the features are not stable. There isn't a good answer to ordering
> > > > features, they are degenerate.

> > > > In general, I would avoid RFE, it is a hack, and can easily lead
> > > > to these
> > > > problems.

> > > > Gaël

> > > > > Thanks for your help,
> > > > > Ben

> > > > > PS1: I checked and n_iter_ seems to be always lower than max_iter.
> > > > > PS2: my data is scaled, I am using "StandardScaler".

> > > > > Le 24/07/2018 à 20:33, Andreas Mueller a écrit :

> > > > > > On 07/24/2018 02:07 PM, Benoît Presles wrote:
> > > > > > > I did the same tests as before adding fit_intercept=False and:
> > > > > > > 1. I have got the same problem as before, i.e. when I execute the
> > > > > > > RFE multiple times I don't get the same ranking each time.
> > > > > > > 2. When I change the solver to 'sag'
> > > > > > > (classifier_RFE=LogisticRegression(C=1e9, verbose=1, max_iter=10000,
> > > > > > > fit_intercept=False, solver='sag')), it seems that I get the same
> > > > > > > ranking at each run. This is not the case with the 'saga' solver.
> > > > > > > The ranking is not the same between the solvers.
> > > > > > > 3. With C=1, it seems that I have the same results at each run for
> > > > > > > all solvers (liblinear, sag and saga), however the ranking is not
> > > > > > > the same between the solvers.

> > > > > > > How can I get reproducible and consistent results?
> > > > > > Did you scale your data? If not, saga and sag will basically fail.
> > > > > > _______________________________________________
> > > > > > scikit-learn mailing list
> > > > > > scikit-learn at python.org
> > > > > > https://mail.python.org/mailman/listinfo/scikit-learn
> > > > > _______________________________________________
> > > > > scikit-learn mailing list
> > > > > scikit-learn at python.org
> > > > > https://mail.python.org/mailman/listinfo/scikit-learn

> > > _______________________________________________
> > > scikit-learn mailing list
> > > scikit-learn at python.org
> > > https://mail.python.org/mailman/listinfo/scikit-learn

> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn

> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-- 
    Gael Varoquaux
    Senior Researcher, INRIA Parietal
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  ++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux