<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Hi Georg.<br>
    Unfortunately this is not entirely trivial right now, but will be
    fixed by<br>
    <a class="moz-txt-link-freetext" href="https://github.com/scikit-learn/scikit-learn/pull/9151">https://github.com/scikit-learn/scikit-learn/pull/9151</a><br>
    and<br>
    <a class="moz-txt-link-freetext" href="https://github.com/scikit-learn/scikit-learn/pull/9012">https://github.com/scikit-learn/scikit-learn/pull/9012</a><br>
    which will be in the next release (0.20).<br>
    <br>
    LabelBinarizer is probably the best work-around for now, and
    selecting columns can be done (awkwardly)<br>
    like in this example:
<a class="moz-txt-link-freetext" href="http://scikit-learn.org/dev/auto_examples/hetero_feature_union.html#sphx-glr-auto-examples-hetero-feature-union-py">http://scikit-learn.org/dev/auto_examples/hetero_feature_union.html#sphx-glr-auto-examples-hetero-feature-union-py</a><br>
    <br>
    Best,<br>
    Andy<br>
    <br>
    <div class="moz-cite-prefix">On 08/17/2017 07:50 AM, Georg Heiler
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAMhVi-RWWMhcjCmHr11-=5hb3+_GTG5UZnfHZz0-nvDBD+ZH=w@mail.gmail.com">
      <div dir="ltr">Hi,
        <div><br>
        </div>
        <div>how can I properly handle categorical values in
          scikit-learn?</div>
        <div><a
href="https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934"
            moz-do-not-send="true">https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934</a> <br>
        </div>
        <div><br>
        </div>
        <div>
          <p style="margin:1em 0px
            0px;padding:0px;text-align:justify;font-size:14px">goals</p>
          <ul style="margin:1em 2em
            0px;padding:0px;list-style-position:initial;font-size:14px">
            <li style="margin:0px;padding:0px;line-height:20px">scikit-learn
              syle fit/transform methods to encode labels of categorical
              features of X</li>
            <li style="margin:0px;padding:0px;line-height:20px">should
              handle unseen labels</li>
            <li style="margin:0px;padding:0px;line-height:20px">should
              be faster than running a label encoder manually for each
              fold and manually checking if the label already was seen
              in the training data i.e. what I currently do (<a
href="https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934"
                style="margin:0px;padding:0px;color:rgb(0,136,204)"
                moz-do-not-send="true">https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934</a> which
              links to <a
                href="https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce"
                style="margin:0px;padding:0px;color:rgb(0,136,204)"
                moz-do-not-send="true">https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce</a>)</li>
            <li style="margin:0px;padding:0px;line-height:20px">only
              some columns are categorical, and only these should be
              converted</li>
          </ul>
          <div><br>
          </div>
        </div>
        <div>Regards,</div>
        <div>Georg</div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
scikit-learn mailing list
<a class="moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>
<a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>