<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <br>
    <div class="moz-cite-prefix">On 9/15/19 8:16 AM, Guillaume Lemaître
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CACDxx9iv5YyyAMnYfim1t4vre_8CDvV2CT=RVMMZwCXKJ5mfag@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr"><br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Sat, 14 Sep 2019 at
            20:59, C W <<a href="mailto:tmrsg11@gmail.com"
              moz-do-not-send="true">tmrsg11@gmail.com</a>> wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div dir="ltr">Thanks, Guillaume. 
              <div>Column transformer looks pretty neat. I've also heard
                though, this pipeline can be tedious to set up?
                Specifying what you want for every feature is a pain.</div>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>It would be interesting for us which part of the pipeline
            is tedious to set up to know if we can improve something
            there.</div>
          <div>Do you mean, that you would like to automatically detect
            of which type of feature (categorical/numerical) and apply a</div>
          <div>default encoder/scaling such as discuss there: <a
href="https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127"
              moz-do-not-send="true">https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127</a></div>
          <div><br>
          </div>
          <div>IMO, one a user perspective, it would be cleaner in some
            cases at the cost of applying blindly a black box</div>
          <div>which might be dangerous.<br>
          </div>
        </div>
      </div>
    </blockquote>
    Also see <a
href="https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor">https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor</a><br>
    Which basically does that.<br>
    <br>
    <br>
    <blockquote type="cite"
cite="mid:CACDxx9iv5YyyAMnYfim1t4vre_8CDvV2CT=RVMMZwCXKJ5mfag@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div> </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div dir="ltr">
              <div><br>
              </div>
              <div>Jaiver,</div>
              <div>Actually, you guessed right. My real data has only
                one numerical variable, looks more like this:</div>
              <div><br>
              </div>
              <div>
                <div>Gender Date            Income  Car   Attendance<br>
                </div>
                <div>Male     2019/3/01   10000   BMW          Yes<br>
                </div>
                <div>Female 2019/5/02    9000   Toyota          No<br>
                </div>
                <div>Male     2019/7/15   12000    Audi           Yes</div>
              </div>
              <div><br>
              </div>
              <div>I am predicting income using all other categorical
                variables. Maybe it is catboost!</div>
              <div><br>
              </div>
              <div>Thanks,</div>
              <div><br>
              </div>
              <div>M</div>
              <div><br>
              </div>
              <div><br>
              </div>
              <div><br>
                <div><br>
                </div>
                <div><br>
                  <table class="gmail-m_5833378593469556480gmail-cf
                    gmail-m_5833378593469556480gmail-gJ"
style="border-collapse:collapse;margin-top:0px;width:auto;font-family:Roboto,RobotoDraft,Helvetica,Arial,sans-serif;font-size:14px;letter-spacing:0.2px;display:block"
                    cellpadding="0">
                  </table>
                </div>
              </div>
            </div>
            <br>
            <div class="gmail_quote">
              <div dir="ltr" class="gmail_attr">On Sat, Sep 14, 2019 at
                9:25 AM Javier López <a class="moz-txt-link-rfc2396E" href="mailto:jlopez@ende.cc"><jlopez@ende.cc></a> wrote:<br>
              </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px
                0.8ex;border-left:1px solid
                rgb(204,204,204);padding-left:1ex">
                <div dir="ltr">If you have datasets with many
                  categorical features, and perhaps many categories, the
                  tools in sklearn are quite limited, 
                  <div>but there are alternative implementations of
                    boosted trees that are designed with categorical
                    features in mind. Take a look</div>
                  <div>at catboost [1], which has an sklearn-compatible
                    API.</div>
                  <div><br>
                  </div>
                  <div>J</div>
                  <div><br>
                  </div>
                  <div>[1] <a href="https://catboost.ai/"
                      target="_blank" moz-do-not-send="true">https://catboost.ai/</a></div>
                </div>
                <br>
                <div class="gmail_quote">
                  <div dir="ltr" class="gmail_attr">On Sat, Sep 14, 2019
                    at 3:40 AM C W <<a
                      href="mailto:tmrsg11@gmail.com" target="_blank"
                      moz-do-not-send="true">tmrsg11@gmail.com</a>>
                    wrote:<br>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px
                    0px 0.8ex;border-left:1px solid
                    rgb(204,204,204);padding-left:1ex">
                    <div dir="ltr">
                      <div>Hello all,</div>
                      <div>I'm very confused. Can the decision tree
                        module handle both continuous and categorical
                        features in the dataset? In this case, it's just
                        CART (Classification and Regression Trees).<br>
                      </div>
                      <div><br>
                      </div>
                      <div>For example,</div>
                      <div>Gender Age Income  Car   Attendance<br>
                      </div>
                      <div>Male     30   10000   BMW          Yes<br>
                      </div>
                      <div>Female 35     9000  Toyota          No<br>
                      </div>
                      <div>Male     50   12000    Audi           Yes<br>
                      </div>
                      <div><br>
                      </div>
                      <div>According to the documentation <a
href="https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart"
                          target="_blank" moz-do-not-send="true">https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart</a>,
                        it can not! <br>
                      </div>
                      <div><br>
                      </div>
                      <div>It says: "scikit-learn implementation does
                        not support categorical variables for now". <br>
                      </div>
                      <div><br>
                      </div>
                      <div>Is this true? If not, can someone point me to
                        an example? If yes, what do people do?<br>
                      </div>
                      <div><br>
                      </div>
                      <div>Thank you very much!<br>
                      </div>
                      <div><br>
                      </div>
                      <div><br>
                      </div>
                      <div><br>
                      </div>
                    </div>
                    _______________________________________________<br>
                    scikit-learn mailing list<br>
                    <a href="mailto:scikit-learn@python.org"
                      target="_blank" moz-do-not-send="true">scikit-learn@python.org</a><br>
                    <a
                      href="https://mail.python.org/mailman/listinfo/scikit-learn"
                      rel="noreferrer" target="_blank"
                      moz-do-not-send="true">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
                  </blockquote>
                </div>
                _______________________________________________<br>
                scikit-learn mailing list<br>
                <a href="mailto:scikit-learn@python.org" target="_blank"
                  moz-do-not-send="true">scikit-learn@python.org</a><br>
                <a
                  href="https://mail.python.org/mailman/listinfo/scikit-learn"
                  rel="noreferrer" target="_blank"
                  moz-do-not-send="true">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
              </blockquote>
            </div>
            _______________________________________________<br>
            scikit-learn mailing list<br>
            <a href="mailto:scikit-learn@python.org" target="_blank"
              moz-do-not-send="true">scikit-learn@python.org</a><br>
            <a
              href="https://mail.python.org/mailman/listinfo/scikit-learn"
              rel="noreferrer" target="_blank" moz-do-not-send="true">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
          </blockquote>
        </div>
        <br clear="all">
        <br>
        -- <br>
        <div dir="ltr" class="gmail_signature">
          <div dir="ltr">
            <div>
              <div dir="ltr">
                <div>
                  <div dir="ltr">
                    <div>Guillaume Lemaitre<br>
                      INRIA Saclay - Parietal team<br>
                      Center for Data Science Paris-Saclay<br>
                      <a href="https://glemaitre.github.io/"
                        target="_blank" moz-do-not-send="true">https://glemaitre.github.io/</a></div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
scikit-learn mailing list
<a class="moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>
<a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>