<div dir="ltr">Thanks, Guillaume. <div>Column transformer looks pretty neat. I've also heard though, this pipeline can be tedious to set up? Specifying what you want for every feature is a pain.</div><div><br></div><div>Jaiver,</div><div>Actually, you guessed right. My real data has only one numerical variable, looks more like this:</div><div><br></div><div><div>Gender Date Income Car Attendance<br></div><div>Male 2019/3/01 10000 BMW Yes<br></div><div>Female 2019/5/02 9000 Toyota No<br></div><div>Male 2019/7/15 12000 Audi Yes</div></div><div><br></div><div>I am predicting income using all other categorical variables. Maybe it is catboost!</div><div><br></div><div>Thanks,</div><div><br></div><div>M</div><div><br></div><div><br></div><div><br><div><br></div><div><br><table cellpadding="0" class="gmail-cf gmail-gJ" style="border-collapse:collapse;margin-top:0px;width:auto;font-family:Roboto,RobotoDraft,Helvetica,Arial,sans-serif;font-size:14px;letter-spacing:0.2px;display:block"></table></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Sep 14, 2019 at 9:25 AM Javier López <jlopez@ende.cc> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">If you have datasets with many categorical features, and perhaps many categories, the tools in sklearn are quite limited, <div>but there are alternative implementations of boosted trees that are designed with categorical features in mind. Take a look</div><div>at catboost [1], which has an sklearn-compatible API.</div><div><br></div><div>J</div><div><br></div><div>[1] <a href="https://catboost.ai/" target="_blank">https://catboost.ai/</a></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Sep 14, 2019 at 3:40 AM C W <<a href="mailto:tmrsg11@gmail.com" target="_blank">tmrsg11@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hello all,</div><div>I'm very confused. Can the decision tree module handle both continuous and categorical features in the dataset? In this case, it's just CART (Classification and Regression Trees).<br></div><div><br></div><div>For example,</div><div>Gender Age Income Car Attendance<br></div><div>Male 30 10000 BMW Yes<br></div><div>Female 35 9000 Toyota No<br></div><div>Male 50 12000 Audi Yes<br></div><div><br></div><div>According to the documentation <a href="https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart" target="_blank">https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart</a>, it can not! <br></div><div><br></div><div>It says: "scikit-learn
implementation does not support categorical variables for now". <br></div><div><br></div><div>Is this true? If not, can someone point me to an example? If yes, what do people do?<br></div><div><br></div><div>Thank you very much!<br></div><div><br></div><div><br></div><div><br></div></div>
_______________________________________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
</blockquote></div>
_______________________________________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
</blockquote></div>