[scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features?
Andreas Mueller
t3kcit at gmail.com
Wed Sep 18 11:11:40 EDT 2019
On 9/15/19 8:16 AM, Guillaume Lemaître wrote:
>
>
> On Sat, 14 Sep 2019 at 20:59, C W <tmrsg11 at gmail.com
> <mailto:tmrsg11 at gmail.com>> wrote:
>
> Thanks, Guillaume.
> Column transformer looks pretty neat. I've also heard though, this
> pipeline can be tedious to set up? Specifying what you want for
> every feature is a pain.
>
>
> It would be interesting for us which part of the pipeline is tedious
> to set up to know if we can improve something there.
> Do you mean, that you would like to automatically detect of which type
> of feature (categorical/numerical) and apply a
> default encoder/scaling such as discuss there:
> https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127
>
> IMO, one a user perspective, it would be cleaner in some cases at the
> cost of applying blindly a black box
> which might be dangerous.
Also see
https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor
Which basically does that.
>
> Jaiver,
> Actually, you guessed right. My real data has only one numerical
> variable, looks more like this:
>
> Gender Date Income Car Attendance
> Male 2019/3/01 10000 BMW Yes
> Female 2019/5/02 9000 Toyota No
> Male 2019/7/15 12000 Audi Yes
>
> I am predicting income using all other categorical variables.
> Maybe it is catboost!
>
> Thanks,
>
> M
>
>
>
>
>
>
> On Sat, Sep 14, 2019 at 9:25 AM Javier López <jlopez at ende.cc> wrote:
>
> If you have datasets with many categorical features, and
> perhaps many categories, the tools in sklearn are quite limited,
> but there are alternative implementations of boosted trees
> that are designed with categorical features in mind. Take a look
> at catboost [1], which has an sklearn-compatible API.
>
> J
>
> [1] https://catboost.ai/
>
> On Sat, Sep 14, 2019 at 3:40 AM C W <tmrsg11 at gmail.com
> <mailto:tmrsg11 at gmail.com>> wrote:
>
> Hello all,
> I'm very confused. Can the decision tree module handle
> both continuous and categorical features in the dataset?
> In this case, it's just CART (Classification and
> Regression Trees).
>
> For example,
> Gender Age Income Car Attendance
> Male 30 10000 BMW Yes
> Female 35 9000 Toyota No
> Male 50 12000 Audi Yes
>
> According to the documentation
> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart,
> it can not!
>
> It says: "scikit-learn implementation does not support
> categorical variables for now".
>
> Is this true? If not, can someone point me to an example?
> If yes, what do people do?
>
> Thank you very much!
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org <mailto:scikit-learn at python.org>
> https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org <mailto:scikit-learn at python.org>
> https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org <mailto:scikit-learn at python.org>
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> --
> Guillaume Lemaitre
> INRIA Saclay - Parietal team
> Center for Data Science Paris-Saclay
> https://glemaitre.github.io/
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190918/9531171b/attachment-0001.html>
More information about the scikit-learn
mailing list