[scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

C W tmrsg11 at gmail.com
Thu Apr 30 23:08:44 EDT 2020


Hermes,

That's an interesting function. Does it work with sklearn after factorize?
Is there any example? Thanks!

On Thu, Apr 30, 2020 at 6:51 PM Hermes Morales <paisanohermes at hotmail.com>
wrote:

> Perhaps pd.factorize could hello?
>
> Obtener Outlook para Android <https://aka.ms/ghei36>
>
> ------------------------------
> *From:* scikit-learn <scikit-learn-bounces+paisanohermes=
> hotmail.com at python.org> on behalf of Gael Varoquaux <
> gael.varoquaux at normalesup.org>
> *Sent:* Thursday, April 30, 2020 5:12:06 PM
> *To:* Scikit-learn mailing list <scikit-learn at python.org>
> *Subject:* Re: [scikit-learn] Why does sklearn require one-hot-encoding
> for categorical features? Can we have a "factor" data type?
>
> On Thu, Apr 30, 2020 at 03:55:00PM -0400, C W wrote:
> > I've used R and Stata software, none needs such transformation. They
> have a
> > data type called "factors", which is different from "numeric".
>
> > My problem with OHE:
> > One-hot-encoding results in large number of features. This really blows
> up
> > quickly. And I have to fight curse of dimensionality with PCA reduction.
> That's
> > not cool!
>
> Most statistical models still not one-hot encoding behind the hood. So, R
> and stata do it too.
>
> Typically, tree-based models can be adapted to work directly on
> categorical data. Ours don't. It's work in progress.
>
> G
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Fscikit-learn&data=02%7C01%7C%7Ce7aa6f99b7914a1f84b208d7ed430801%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637238744453345410&sdata=e3BfHB4v5VFteeZ0Zh3FJ9Wcz9KmkUwur5i8Reue3mc%3D&reserved=0
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200430/799ba834/attachment.html>


More information about the scikit-learn mailing list