[scikit-learn] LogisticRegression coef_ greater than n_features?

Tue Jan 8 00:32:22 EST 2019

E.g, if you have a feature with values 'a' , 'b', 'c', then applying the one hot encoder will transform this into 3 features.

Best,
Sebastian

> On Jan 7, 2019, at 11:02 PM, pisymbol <pisymbol at gmail.com> wrote:
> 
> 
> 
> On Mon, Jan 7, 2019 at 11:50 PM pisymbol <pisymbol at gmail.com> wrote:
> According to the doc (0.20.2) the coef_ variables are suppose to be shape (1, n_features) for binary classification. Well I created a Pipeline and performed a GridSearchCV to create a LogisticRegresion model that does fairly well. However, when I want to rank feature importance I noticed that my coefs_ for my best_estimator_ has 24 entries while my training data has 22.
> 
> What am I missing? How could coef_ > n_features?
> 
> 
> Just a follow-up, I am using a OneHotEncoder to encode two categoricals as part of my pipeline (I am also using an imputer/standard scaler too but I don't see how that could add features).
> 
> Could my pipeline actually add two more features during fitting?
> 
> -aps
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn