From a theoretical point of view, yes you should one-hot-encode your categorical variables if you don't want any ordering to be implied.
Brian On 29 Mar 2017 08:40, "Andrew Howe" <ahowe42@gmail.com> wrote:
My question is more along the lines of will the DT classifier falsely infer an ordering?
<~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD www.andrewhowe.com http://www.linkedin.com/in/ahowe42 https://www.researchgate.net/profile/John_Howe12/ I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
On Wed, Mar 29, 2017 at 10:32 AM, Olivier Grisel <olivier.grisel@ensta.org
wrote:
For large enough models (e.g. random forests or gradient boosted trees ensembles) I would definitely recommend arbitrary integer coding for the categorical variables.
Try both, use cross-validation and see for yourself.
-- Olivier _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn