<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi Georg.<br>
Unfortunately this is not entirely trivial right now, but will be
fixed by<br>
<a class="moz-txt-link-freetext" href="https://github.com/scikit-learn/scikit-learn/pull/9151">https://github.com/scikit-learn/scikit-learn/pull/9151</a><br>
and<br>
<a class="moz-txt-link-freetext" href="https://github.com/scikit-learn/scikit-learn/pull/9012">https://github.com/scikit-learn/scikit-learn/pull/9012</a><br>
which will be in the next release (0.20).<br>
<br>
LabelBinarizer is probably the best work-around for now, and
selecting columns can be done (awkwardly)<br>
like in this example:
<a class="moz-txt-link-freetext" href="http://scikit-learn.org/dev/auto_examples/hetero_feature_union.html#sphx-glr-auto-examples-hetero-feature-union-py">http://scikit-learn.org/dev/auto_examples/hetero_feature_union.html#sphx-glr-auto-examples-hetero-feature-union-py</a><br>
<br>
Best,<br>
Andy<br>
<br>
<div class="moz-cite-prefix">On 08/17/2017 07:50 AM, Georg Heiler
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAMhVi-RWWMhcjCmHr11-=5hb3+_GTG5UZnfHZz0-nvDBD+ZH=w@mail.gmail.com">
<div dir="ltr">Hi,
<div><br>
</div>
<div>how can I properly handle categorical values in
scikit-learn?</div>
<div><a
href="https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934"
moz-do-not-send="true">https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934</a> <br>
</div>
<div><br>
</div>
<div>
<p style="margin:1em 0px
0px;padding:0px;text-align:justify;font-size:14px">goals</p>
<ul style="margin:1em 2em
0px;padding:0px;list-style-position:initial;font-size:14px">
<li style="margin:0px;padding:0px;line-height:20px">scikit-learn
syle fit/transform methods to encode labels of categorical
features of X</li>
<li style="margin:0px;padding:0px;line-height:20px">should
handle unseen labels</li>
<li style="margin:0px;padding:0px;line-height:20px">should
be faster than running a label encoder manually for each
fold and manually checking if the label already was seen
in the training data i.e. what I currently do (<a
href="https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934"
style="margin:0px;padding:0px;color:rgb(0,136,204)"
moz-do-not-send="true">https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934</a> which
links to <a
href="https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce"
style="margin:0px;padding:0px;color:rgb(0,136,204)"
moz-do-not-send="true">https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce</a>)</li>
<li style="margin:0px;padding:0px;line-height:20px">only
some columns are categorical, and only these should be
converted</li>
</ul>
<div><br>
</div>
</div>
<div>Regards,</div>
<div>Georg</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
scikit-learn mailing list
<a class="moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>
<a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>
</pre>
</blockquote>
<br>
</body>
</html>