[scikit-learn] Classifiers for dataset with categorical features

Raga Markely raga.markely at gmail.com
Fri Jul 21 14:37:25 EDT 2017


Thank you, Jacob. Appreciate it.

Regarding 'perform better', I was referring to better accuracy, precision,
recall, F1 score, etc.

Thanks,
Raga

On Fri, Jul 21, 2017 at 2:27 PM, Jacob Schreiber <jmschreiber91 at gmail.com>
wrote:

> Traditionally tree based methods are very good when it comes to
> categorical variables and can handle them appropriately. There is a current
> WIP PR to add this support to sklearn. I'm not exactly sure what you mean
> that "perform better" though. Estimators that ignore the categorical aspect
> of these variables and treat them as discrete will likely perform worse
> than those that treat them appropriately.
>
> On Fri, Jul 21, 2017 at 8:11 AM, Raga Markely <raga.markely at gmail.com>
> wrote:
>
>> Hello,
>>
>> I am wondering if there are some classifiers that perform better for
>> datasets with categorical features (converted into sparse input matrix with
>> pd.get_dummies())? The data for the categorical features are nominal (order
>> doesn't matter, e.g. country, occupation, etc).
>>
>> If you could provide me some references (papers, books, website, etc),
>> that would be great.
>>
>> Thank you very much!
>> Raga
>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170721/83e073d0/attachment.html>


More information about the scikit-learn mailing list