[scikit-learn] PR #8190: "Implement Complement Naive Bayes."

Michael Alcorn malcorn at redhat.com
Fri Jan 20 11:37:02 EST 2017


Hi all,

I would appreciate it if a couple of maintainers could take a look at my
pull request (https://github.com/scikit-learn/scikit-learn/pull/8190)
implementing the Complement Naive Bayes (CNB) classifier described in
Rennie et al. (2003). CNB regularly outperforms the standard Multinomial
Naive Bayes (MNB) classifier on real world data sets due to the tendency
for real world data sets to suffer from class imbalance. Apache Mahout
offers its own implementation of CNB alongside MNB, but it would be nice to
have an easily usable CNB implementation available in scikit-learn.

Training the CNB classifier on a reasonably sized data set of 493,038
documents with a median length of 87 tokens and 1,155,784 distinct tokens
took around 8.5 seconds. For comparison, the MNB classifier took around 4.5
seconds to train, but the CNB had a 10% lower error rate, a seemingly
worthwhile tradeoff.

Happy to answer any questions or discuss further.

Thanks,
Michael A. Alcorn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170120/885664c6/attachment.html>


More information about the scikit-learn mailing list