[scikit-learn] decision trees

Brian Holt bdholt1 at gmail.com
Wed Mar 29 04:52:11 EDT 2017


>From a theoretical point of view, yes you should one-hot-encode your
categorical variables if you don't want any ordering to be implied.

Brian

On 29 Mar 2017 08:40, "Andrew Howe" <ahowe42 at gmail.com> wrote:

> My question is more along the lines of will the DT classifier falsely
> infer an ordering?
>
> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
> J. Andrew Howe, PhD
> www.andrewhowe.com
> http://www.linkedin.com/in/ahowe42
> https://www.researchgate.net/profile/John_Howe12/
> I live to learn, so I can learn to live. - me
> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>
> On Wed, Mar 29, 2017 at 10:32 AM, Olivier Grisel <olivier.grisel at ensta.org
> > wrote:
>
>> For large enough models (e.g. random forests or gradient boosted trees
>> ensembles) I would definitely recommend arbitrary integer coding for
>> the categorical variables.
>>
>> Try both, use cross-validation and see for yourself.
>>
>> --
>> Olivier
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170329/f2368185/attachment.html>


More information about the scikit-learn mailing list