[scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features?

C W tmrsg11 at gmail.com
Sat Sep 14 00:41:06 EDT 2019


Thanks, Sebastian. It's great to know that it works, just need to do
one-hot-encoding first.

I have mixed data type (continuous and categorical). Should I tree.
DecisionTreeClassifier() or tree.DecisionTreeRegressor()?

I'm guessing tree.DecisionTreeClassifier()?

Best,

Mike

On Fri, Sep 13, 2019 at 11:59 PM Sebastian Raschka <
mail at sebastianraschka.com> wrote:

> Hi,
>
> if you have the category "car" as shown in your example, this would
> effectively be something like
>
> BMW=0
> Toyota=1
> Audi=2
>
> Sure, the algorithm will execute just fine on the feature column with
> values in {0, 1, 2}. However, the problem is that it will come up with
> binary rules like x_i>= 0.5, x_i>= 1.5, and x_i>= 2.5. I.e., it will treat
> it is a continuous variable.
>
> What you can do is to encode this feature via one-hot encoding --
> basically extend it into 2 (or 3) binary variables. This has it's own
> problems (if you have a feature with many possible values, you will end up
> with a large number of binary variables, and they may dominate in the
> resulting tree over other feature variables).
>
> In any case, I guess this is what
>
> > "scikit-learn implementation does not support categorical variables for
> now".
>
>
> means ;).
>
> Best,
> Sebastian
>
> > On Sep 13, 2019, at 9:38 PM, C W <tmrsg11 at gmail.com> wrote:
> >
> > Hello all,
> > I'm very confused. Can the decision tree module handle both continuous
> and categorical features in the dataset? In this case, it's just CART
> (Classification and Regression Trees).
> >
> > For example,
> > Gender Age Income  Car   Attendance
> > Male     30   10000   BMW          Yes
> > Female 35     9000  Toyota          No
> > Male     50   12000    Audi           Yes
> >
> > According to the documentation
> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart,
> it can not!
> >
> > It says: "scikit-learn implementation does not support categorical
> variables for now".
> >
> > Is this true? If not, can someone point me to an example? If yes, what
> do people do?
> >
> > Thank you very much!
> >
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190914/7021307f/attachment.html>


More information about the scikit-learn mailing list