[scikit-learn] Categorical handling
Georg Heiler
georg.kf.heiler at gmail.com
Thu Aug 17 07:50:33 EDT 2017
Hi,
how can I properly handle categorical values in scikit-learn?
https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934
goals
- scikit-learn syle fit/transform methods to encode labels of
categorical features of X
- should handle unseen labels
- should be faster than running a label encoder manually for each fold
and manually checking if the label already was seen in the training data
i.e. what I currently do (
https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934
which
links to https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce
)
- only some columns are categorical, and only these should be converted
Regards,
Georg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170817/0a2d2c9b/attachment.html>
More information about the scikit-learn
mailing list