Hello, For a university project I worked on a Sentiment Analysis challenge (Movie Reviews) and implemented a version of NBSVM as described in this paper: http://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf If I am not wrong there is no NBSVM class in scikit-learn. That is why I would like to contribute by coding a NB matrix class if the work is not done by someone else already. Best regards, Ivaylo Petkantchin
I think it could be implemented as a preprocessing step: this is the approach followed by: https://github.com/ryankiros/skip-thoughts/blob/master/eval_classification.p... Note that in that case LogisticRegression is used as the final classifier instead of a squared hinge loss SVM but that should not change much in practice. If you want to make this approach scikit-learn compatible (to work with the Pipeline and sklearn's model selection tools for instance) be sure to implement the Transformer API as documented here: http://scikit-learn.org/dev/developers/contributing.html#apis-of-scikit-lear... Read the rest of the contributions guide: http://scikit-learn.org/dev/developers NBSVM is quite recent and might not strictly follow the conditions for inclusion as stated in: http://scikit-learn.org/stable/faq.html#can-i-add-this-new-algorithm-that-i-... It already has 163 citations though: https://scholar.google.com/scholar?oi=bibs&hl=en&cites=1710642630990759287 As this is a really strong baseline and the model is not complex and should blend well within the scikit-learn API I would be +1 for inclusion in sklearn. -- Olivier
Thank you for your answer ! I will start working on all the requirements for the scikit learn API. 2016-06-07 10:11 GMT+02:00 Olivier Grisel <olivier.grisel@ensta.org>:
I think it could be implemented as a preprocessing step: this is the approach followed by:
https://github.com/ryankiros/skip-thoughts/blob/master/eval_classification.p...
Note that in that case LogisticRegression is used as the final classifier instead of a squared hinge loss SVM but that should not change much in practice.
If you want to make this approach scikit-learn compatible (to work with the Pipeline and sklearn's model selection tools for instance) be sure to implement the Transformer API as documented here:
http://scikit-learn.org/dev/developers/contributing.html#apis-of-scikit-lear...
Read the rest of the contributions guide:
http://scikit-learn.org/dev/developers
NBSVM is quite recent and might not strictly follow the conditions for inclusion as stated in:
http://scikit-learn.org/stable/faq.html#can-i-add-this-new-algorithm-that-i-...
It already has 163 citations though:
https://scholar.google.com/scholar?oi=bibs&hl=en&cites=1710642630990759287
As this is a really strong baseline and the model is not complex and should blend well within the scikit-learn API I would be +1 for inclusion in sklearn.
-- Olivier _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (2)
-
Ivo Petkantchin -
Olivier Grisel