[scikit-learn] RFE with logistic regression
Gael Varoquaux
gael.varoquaux at normalesup.org
Wed Jul 25 07:50:04 EDT 2018
On Wed, Jul 25, 2018 at 12:36:55PM +0200, Benoît Presles wrote:
> Do you think the problems I have can come from correlated features? Indeed,
> in my dataset I have some highly correlated features.
Yes, in general selecting features conditionally on others is very hard
when features are highly correlated.
> Do you think this could explain why I don't get reproducible and consistent
> results?
Yes.
> Thanks for your help,
> Ben
> Le 24/07/2018 à 23:44, bthirion a écrit :
> > Univariate screening is somewhat hackish too, but much more stable --
> > and cheap.
> > Best,
> > Bertrand
> > On 24/07/2018 23:33, Benoît Presles wrote:
> > > So you think that I cannot get reproducible and consistent results
> > > with this method ?
> > > If you would avoid RFE, which method do you suggest to find the best
> > > features ?
> > > Ben
> > > Le 24/07/2018 à 21:34, Gael Varoquaux a écrit :
> > > > On Tue, Jul 24, 2018 at 08:43:27PM +0200, Benoît Presles wrote:
> > > > > 3. With C=1, it seems that I have the same results at each run for all
> > > > > solvers (liblinear, sag and saga), however the ranking is not the same
> > > > > between the solvers.
> > > > Your problem is probably ill-conditioned, hence the specific weights on
> > > > the features are not stable. There isn't a good answer to ordering
> > > > features, they are degenerate.
> > > > In general, I would avoid RFE, it is a hack, and can easily lead
> > > > to these
> > > > problems.
> > > > Gaël
> > > > > Thanks for your help,
> > > > > Ben
> > > > > PS1: I checked and n_iter_ seems to be always lower than max_iter.
> > > > > PS2: my data is scaled, I am using "StandardScaler".
> > > > > Le 24/07/2018 à 20:33, Andreas Mueller a écrit :
> > > > > > On 07/24/2018 02:07 PM, Benoît Presles wrote:
> > > > > > > I did the same tests as before adding fit_intercept=False and:
> > > > > > > 1. I have got the same problem as before, i.e. when I execute the
> > > > > > > RFE multiple times I don't get the same ranking each time.
> > > > > > > 2. When I change the solver to 'sag'
> > > > > > > (classifier_RFE=LogisticRegression(C=1e9, verbose=1, max_iter=10000,
> > > > > > > fit_intercept=False, solver='sag')), it seems that I get the same
> > > > > > > ranking at each run. This is not the case with the 'saga' solver.
> > > > > > > The ranking is not the same between the solvers.
> > > > > > > 3. With C=1, it seems that I have the same results at each run for
> > > > > > > all solvers (liblinear, sag and saga), however the ranking is not
> > > > > > > the same between the solvers.
> > > > > > > How can I get reproducible and consistent results?
> > > > > > Did you scale your data? If not, saga and sag will basically fail.
> > > > > > _______________________________________________
> > > > > > scikit-learn mailing list
> > > > > > scikit-learn at python.org
> > > > > > https://mail.python.org/mailman/listinfo/scikit-learn
> > > > > _______________________________________________
> > > > > scikit-learn mailing list
> > > > > scikit-learn at python.org
> > > > > https://mail.python.org/mailman/listinfo/scikit-learn
> > > _______________________________________________
> > > scikit-learn mailing list
> > > scikit-learn at python.org
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
--
Gael Varoquaux
Senior Researcher, INRIA Parietal
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone: ++ 33-1-69-08-79-68
http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
More information about the scikit-learn
mailing list