March 29, 2017
1:02 p.m.
For large enough models (e.g. random forests or gradient boosted trees ensembles) I would definitely recommend arbitrary integer coding for the categorical variables. Try both, use cross-validation and see for yourself. -- Olivier