<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<br>
<div class="moz-cite-prefix">On 9/15/19 8:16 AM, Guillaume Lemaître
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CACDxx9iv5YyyAMnYfim1t4vre_8CDvV2CT=RVMMZwCXKJ5mfag@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sat, 14 Sep 2019 at
20:59, C W <<a href="mailto:tmrsg11@gmail.com"
moz-do-not-send="true">tmrsg11@gmail.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">Thanks, Guillaume.
<div>Column transformer looks pretty neat. I've also heard
though, this pipeline can be tedious to set up?
Specifying what you want for every feature is a pain.</div>
</div>
</blockquote>
<div><br>
</div>
<div>It would be interesting for us which part of the pipeline
is tedious to set up to know if we can improve something
there.</div>
<div>Do you mean, that you would like to automatically detect
of which type of feature (categorical/numerical) and apply a</div>
<div>default encoder/scaling such as discuss there: <a
href="https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127"
moz-do-not-send="true">https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127</a></div>
<div><br>
</div>
<div>IMO, one a user perspective, it would be cleaner in some
cases at the cost of applying blindly a black box</div>
<div>which might be dangerous.<br>
</div>
</div>
</div>
</blockquote>
Also see <a
href="https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor">https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor</a><br>
Which basically does that.<br>
<br>
<br>
<blockquote type="cite"
cite="mid:CACDxx9iv5YyyAMnYfim1t4vre_8CDvV2CT=RVMMZwCXKJ5mfag@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div><br>
</div>
<div>Jaiver,</div>
<div>Actually, you guessed right. My real data has only
one numerical variable, looks more like this:</div>
<div><br>
</div>
<div>
<div>Gender Date Income Car Attendance<br>
</div>
<div>Male 2019/3/01 10000 BMW Yes<br>
</div>
<div>Female 2019/5/02 9000 Toyota No<br>
</div>
<div>Male 2019/7/15 12000 Audi Yes</div>
</div>
<div><br>
</div>
<div>I am predicting income using all other categorical
variables. Maybe it is catboost!</div>
<div><br>
</div>
<div>Thanks,</div>
<div><br>
</div>
<div>M</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
<div><br>
</div>
<div><br>
<table class="gmail-m_5833378593469556480gmail-cf
gmail-m_5833378593469556480gmail-gJ"
style="border-collapse:collapse;margin-top:0px;width:auto;font-family:Roboto,RobotoDraft,Helvetica,Arial,sans-serif;font-size:14px;letter-spacing:0.2px;display:block"
cellpadding="0">
</table>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sat, Sep 14, 2019 at
9:25 AM Javier López <a class="moz-txt-link-rfc2396E" href="mailto:jlopez@ende.cc"><jlopez@ende.cc></a> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">If you have datasets with many
categorical features, and perhaps many categories, the
tools in sklearn are quite limited,
<div>but there are alternative implementations of
boosted trees that are designed with categorical
features in mind. Take a look</div>
<div>at catboost [1], which has an sklearn-compatible
API.</div>
<div><br>
</div>
<div>J</div>
<div><br>
</div>
<div>[1] <a href="https://catboost.ai/"
target="_blank" moz-do-not-send="true">https://catboost.ai/</a></div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sat, Sep 14, 2019
at 3:40 AM C W <<a
href="mailto:tmrsg11@gmail.com" target="_blank"
moz-do-not-send="true">tmrsg11@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>Hello all,</div>
<div>I'm very confused. Can the decision tree
module handle both continuous and categorical
features in the dataset? In this case, it's just
CART (Classification and Regression Trees).<br>
</div>
<div><br>
</div>
<div>For example,</div>
<div>Gender Age Income Car Attendance<br>
</div>
<div>Male 30 10000 BMW Yes<br>
</div>
<div>Female 35 9000 Toyota No<br>
</div>
<div>Male 50 12000 Audi Yes<br>
</div>
<div><br>
</div>
<div>According to the documentation <a
href="https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart"
target="_blank" moz-do-not-send="true">https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart</a>,
it can not! <br>
</div>
<div><br>
</div>
<div>It says: "scikit-learn implementation does
not support categorical variables for now". <br>
</div>
<div><br>
</div>
<div>Is this true? If not, can someone point me to
an example? If yes, what do people do?<br>
</div>
<div><br>
</div>
<div>Thank you very much!<br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
_______________________________________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org"
target="_blank" moz-do-not-send="true">scikit-learn@python.org</a><br>
<a
href="https://mail.python.org/mailman/listinfo/scikit-learn"
rel="noreferrer" target="_blank"
moz-do-not-send="true">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
</blockquote>
</div>
_______________________________________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank"
moz-do-not-send="true">scikit-learn@python.org</a><br>
<a
href="https://mail.python.org/mailman/listinfo/scikit-learn"
rel="noreferrer" target="_blank"
moz-do-not-send="true">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
</blockquote>
</div>
_______________________________________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank"
moz-do-not-send="true">scikit-learn@python.org</a><br>
<a
href="https://mail.python.org/mailman/listinfo/scikit-learn"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
</blockquote>
</div>
<br clear="all">
<br>
-- <br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>Guillaume Lemaitre<br>
INRIA Saclay - Parietal team<br>
Center for Data Science Paris-Saclay<br>
<a href="https://glemaitre.github.io/"
target="_blank" moz-do-not-send="true">https://glemaitre.github.io/</a></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
scikit-learn mailing list
<a class="moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>
<a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>
</pre>
</blockquote>
<br>
</body>
</html>