<div dir="ltr">Hi,<div><br></div><div>how can I properly handle categorical values in scikit-learn?</div><div><a href="https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934">https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934</a> <br></div><div><br></div><div><p style="margin:1em 0px 0px;padding:0px;text-align:justify;font-size:14px">goals</p><ul style="margin:1em 2em 0px;padding:0px;list-style-position:initial;font-size:14px"><li style="margin:0px;padding:0px;line-height:20px">scikit-learn syle fit/transform methods to encode labels of categorical features of X</li><li style="margin:0px;padding:0px;line-height:20px">should handle unseen labels</li><li style="margin:0px;padding:0px;line-height:20px">should be faster than running a label encoder manually for each fold and manually checking if the label already was seen in the training data i.e. what I currently do (<a href="https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934" style="margin:0px;padding:0px;color:rgb(0,136,204)">https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934</a> which links to <a href="https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce" style="margin:0px;padding:0px;color:rgb(0,136,204)">https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce</a>)</li><li style="margin:0px;padding:0px;line-height:20px">only some columns are categorical, and only these should be converted</li></ul><div><br></div></div><div>Regards,</div><div>Georg</div></div>