[scikit-learn] Naive Bayes - Multinomial Naive Bayes tf-idf

Andy t3kcit at gmail.com
Fri Nov 4 10:43:36 EDT 2016



On 11/04/2016 05:45 AM, Marcin Mirończuk wrote:
> Hi,
> In our experiments, we use a Multinomial Naive Bayes (MNB). The 
> traditional MNB implies the TF weight of the words. We read in 
> documentation http://scikit-learn.org/stable/modules/naive_bayes.html 
> which describes Multinomial Naive Bayes that "... where the data are 
> typically represented as word vector counts, although tf-idf vectors 
> are also known to work well in practice". The "word vector counts" is 
> a TF and it is well known. We have a problem which the "tf-idf 
> vectors". In this case, i.e. tf-idf  it was implemented the approach 
> of the D. M. Rennie et all Tackling the Poor Assumptions of Naive 
> Bayes Text Classification? In the documentation, there are not any 
> citation of this solution.
No, I think that paper implements something slightly different. The 
documentation says that you can apply the TfidfVectorizer instead of 
CountVectorizer and it can still work.


More information about the scikit-learn mailing list