[scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features?

Andreas Mueller t3kcit at gmail.com
Wed Sep 18 11:11:40 EDT 2019



On 9/15/19 8:16 AM, Guillaume Lemaître wrote:
>
>
> On Sat, 14 Sep 2019 at 20:59, C W <tmrsg11 at gmail.com 
> <mailto:tmrsg11 at gmail.com>> wrote:
>
>     Thanks, Guillaume.
>     Column transformer looks pretty neat. I've also heard though, this
>     pipeline can be tedious to set up? Specifying what you want for
>     every feature is a pain.
>
>
> It would be interesting for us which part of the pipeline is tedious 
> to set up to know if we can improve something there.
> Do you mean, that you would like to automatically detect of which type 
> of feature (categorical/numerical) and apply a
> default encoder/scaling such as discuss there: 
> https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127
>
> IMO, one a user perspective, it would be cleaner in some cases at the 
> cost of applying blindly a black box
> which might be dangerous.
Also see 
https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor
Which basically does that.


>
>     Jaiver,
>     Actually, you guessed right. My real data has only one numerical
>     variable, looks more like this:
>
>     Gender Date            Income  Car   Attendance
>     Male     2019/3/01   10000   BMW          Yes
>     Female 2019/5/02    9000   Toyota          No
>     Male     2019/7/15   12000    Audi           Yes
>
>     I am predicting income using all other categorical variables.
>     Maybe it is catboost!
>
>     Thanks,
>
>     M
>
>
>
>
>
>
>     On Sat, Sep 14, 2019 at 9:25 AM Javier López <jlopez at ende.cc> wrote:
>
>         If you have datasets with many categorical features, and
>         perhaps many categories, the tools in sklearn are quite limited,
>         but there are alternative implementations of boosted trees
>         that are designed with categorical features in mind. Take a look
>         at catboost [1], which has an sklearn-compatible API.
>
>         J
>
>         [1] https://catboost.ai/
>
>         On Sat, Sep 14, 2019 at 3:40 AM C W <tmrsg11 at gmail.com
>         <mailto:tmrsg11 at gmail.com>> wrote:
>
>             Hello all,
>             I'm very confused. Can the decision tree module handle
>             both continuous and categorical features in the dataset?
>             In this case, it's just CART (Classification and
>             Regression Trees).
>
>             For example,
>             Gender Age Income  Car   Attendance
>             Male     30   10000   BMW          Yes
>             Female 35     9000  Toyota          No
>             Male     50   12000    Audi           Yes
>
>             According to the documentation
>             https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart,
>             it can not!
>
>             It says: "scikit-learn implementation does not support
>             categorical variables for now".
>
>             Is this true? If not, can someone point me to an example?
>             If yes, what do people do?
>
>             Thank you very much!
>
>
>
>             _______________________________________________
>             scikit-learn mailing list
>             scikit-learn at python.org <mailto:scikit-learn at python.org>
>             https://mail.python.org/mailman/listinfo/scikit-learn
>
>         _______________________________________________
>         scikit-learn mailing list
>         scikit-learn at python.org <mailto:scikit-learn at python.org>
>         https://mail.python.org/mailman/listinfo/scikit-learn
>
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> -- 
> Guillaume Lemaitre
> INRIA Saclay - Parietal team
> Center for Data Science Paris-Saclay
> https://glemaitre.github.io/
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190918/9531171b/attachment-0001.html>


More information about the scikit-learn mailing list