[scikit-learn] [Feature] drop_one in one hot encoder

Sun Jun 25 12:06:24 EDT 2017

Hi,

hm, I think that dropping a column in onehot encoded features is quite uncommon in machine learning practice -- based on the applications and implementations I've seen. My guess is that the onehot encoded features are multicolinear anyway!? There may be certain algorithms that benefit from dropping a column, though (e.g., linear regression as a simple example). For instance, pandas' get_dummies has a "drop_first" parameter ...
I think it would make sense to have such a parameter in the onehotencoder as well, e.g., for working with pipelines.

Best,
Sebastian

> On Jun 25, 2017, at 7:48 AM, Parminder Singh <parmsingh129 at gmail.com> wrote:
> 
> Hy Sci-kittens! :-)
> 
> I was doing machine learning a-z course on Udemy, there they told that every time one-hot encoding is done, one of the columns should be dropped as it is like doubling same category twice and redundant to model. I thought if instead of having user find the index and drop it after preprocessing, OneHotEncoder had a drop_one variable, and it automatically removed the last column. What are your thoughts about this? I am new to this community, would like to contribute this myself if it is possible addition.
> 
> Thanks,
> Trion129
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn