The right thing to do would probably be to write a scikit-learn-contrib package for them and see if they gather traction. If they perform well on eg kaggle competitions, we know that we need them in :). Cheers, Gaƫl On Fri, Jul 21, 2017 at 07:09:03PM -0400, Sebastian Raschka wrote:
Maybe because they are genetic algorithms, which are -- for some reason -- not very popular in the ML field in general :P. (People in bioinformatics seem to use them a lot though.). Also, the name "Learning Classifier Systems" is also a bit weird I'd must say: I remember that when Ryan introduced me to those, I was like "ah yeah, sure, I know machine learning classifiers" ;)
On Jul 21, 2017, at 3:01 PM, Stuart Reynolds <stuart@stuartreynolds.net> wrote:
+1 LCS and its many many variants seem very practical and adaptable. I'm not sure why they haven't gotten traction. Overshadowed by GBM & random forests?
On Fri, Jul 21, 2017 at 11:52 AM, Sebastian Raschka <se.raschka@gmail.com> wrote:
Just to throw some additional ideas in here. Based on a conversation with a colleague some time ago, I think learning classifier systems (https://en.wikipedia.org/wiki/Learning_classifier_system) are particularly useful when working with large, sparse binary vectors (like from a one-hot encoding). I am really not into LCS's, and only know the basics (read through the first chapters of the Intro to Learning Classifier Systems draft; the print version will be out later this year). Also, I saw an interesting poster on a Set Covering Machine algorithm once, which they benchmarked against SVMs, random forests and the like for categorical (genomics data). Looked promising.
Best, Sebastian
On Jul 21, 2017, at 2:37 PM, Raga Markely <raga.markely@gmail.com> wrote:
Thank you, Jacob. Appreciate it.
Regarding 'perform better', I was referring to better accuracy, precision, recall, F1 score, etc.
Thanks, Raga
On Fri, Jul 21, 2017 at 2:27 PM, Jacob Schreiber <jmschreiber91@gmail.com> wrote: Traditionally tree based methods are very good when it comes to categorical variables and can handle them appropriately. There is a current WIP PR to add this support to sklearn. I'm not exactly sure what you mean that "perform better" though. Estimators that ignore the categorical aspect of these variables and treat them as discrete will likely perform worse than those that treat them appropriately.
On Fri, Jul 21, 2017 at 8:11 AM, Raga Markely <raga.markely@gmail.com> wrote: Hello,
I am wondering if there are some classifiers that perform better for datasets with categorical features (converted into sparse input matrix with pd.get_dummies())? The data for the categorical features are nominal (order doesn't matter, e.g. country, occupation, etc).
If you could provide me some references (papers, books, website, etc), that would be great.
Thank you very much! Raga
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux