<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hi Georg.<br>

    Unfortunately this is not entirely trivial right now, but will be

    fixed by<br>

    <a class="moz-txt-link-freetext" href="https://github.com/scikit-learn/scikit-learn/pull/9151">https://github.com/scikit-learn/scikit-learn/pull/9151</a><br>

    and<br>

    <a class="moz-txt-link-freetext" href="https://github.com/scikit-learn/scikit-learn/pull/9012">https://github.com/scikit-learn/scikit-learn/pull/9012</a><br>

    which will be in the next release (0.20).<br>

    <br>

    LabelBinarizer is probably the best work-around for now, and

    selecting columns can be done (awkwardly)<br>

    like in this example:

<a class="moz-txt-link-freetext" href="http://scikit-learn.org/dev/auto_examples/hetero_feature_union.html#sphx-glr-auto-examples-hetero-feature-union-py">http://scikit-learn.org/dev/auto_examples/hetero_feature_union.html#sphx-glr-auto-examples-hetero-feature-union-py</a><br>

    <br>

    Best,<br>

    Andy<br>

    <br>

    <div class="moz-cite-prefix">On 08/17/2017 07:50 AM, Georg Heiler

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAMhVi-RWWMhcjCmHr11-=5hb3+_GTG5UZnfHZz0-nvDBD+ZH=w@mail.gmail.com">

      <div dir="ltr">Hi,

        <div><br>

        </div>

        <div>how can I properly handle categorical values in

          scikit-learn?</div>

        <div><a

href="https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934"

            moz-do-not-send="true">https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934</a> <br>

        </div>

        <div><br>

        </div>

        <div>

          <p style="margin:1em 0px

            0px;padding:0px;text-align:justify;font-size:14px">goals</p>

          <ul style="margin:1em 2em

            0px;padding:0px;list-style-position:initial;font-size:14px">

            <li style="margin:0px;padding:0px;line-height:20px">scikit-learn

              syle fit/transform methods to encode labels of categorical

              features of X</li>

            <li style="margin:0px;padding:0px;line-height:20px">should

              handle unseen labels</li>

            <li style="margin:0px;padding:0px;line-height:20px">should

              be faster than running a label encoder manually for each

              fold and manually checking if the label already was seen

              in the training data i.e. what I currently do (<a

href="https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934"

                style="margin:0px;padding:0px;color:rgb(0,136,204)"

                moz-do-not-send="true">https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934</a> which

              links to <a

                href="https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce"

                style="margin:0px;padding:0px;color:rgb(0,136,204)"

                moz-do-not-send="true">https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce</a>)</li>

            <li style="margin:0px;padding:0px;line-height:20px">only

              some columns are categorical, and only these should be

              converted</li>

          </ul>

          <div><br>

          </div>

        </div>

        <div>Regards,</div>

        <div>Georg</div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

scikit-learn mailing list

<a class="moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>

<a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>