[scikit-learn] decision trees

Olivier Grisel olivier.grisel at ensta.org
Wed Mar 29 03:32:39 EDT 2017


For large enough models (e.g. random forests or gradient boosted trees
ensembles) I would definitely recommend arbitrary integer coding for
the categorical variables.

Try both, use cross-validation and see for yourself.

-- 
Olivier


More information about the scikit-learn mailing list