<div dir="ltr">Traditionally tree based methods are very good when it comes to categorical variables and can handle them appropriately. There is a current WIP PR to add this support to sklearn. I'm not exactly sure what you mean that "perform better" though. Estimators that ignore the categorical aspect of these variables and treat them as discrete will likely perform worse than those that treat them appropriately.</div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jul 21, 2017 at 8:11 AM, Raga Markely <span dir="ltr"><<a href="mailto:raga.markely@gmail.com" target="_blank">raga.markely@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hello,<div><br></div><div>I am wondering if there are some classifiers that perform better for datasets with categorical features (converted into sparse input matrix with pd.get_dummies())? The data for the categorical features are nominal (order doesn't matter, e.g. country, occupation, etc).</div><div><br></div><div>If you could provide me some references (papers, books, website, etc), that would be great.</div><div><br></div><div>Thank you very much!</div><span class="HOEnZb"><font color="#888888"><div>Raga</div><div><br></div><div><br></div></font></span></div>
<br>______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
<br></blockquote></div><br></div>