Evaluating the predictive accuracy of the NB model

potato_head kanohen at gmail.com
Sat Apr 16 13:05:17 EDT 2016



Please guys,

What am I doing wrong with using scikitlearn from nltk to check the 
accuracy of the naive bayes classifier?


...readFile definition not needed #divide the data into training and testing sets
data = readFile('Data_test/')
training_set = list_nltk[:2000000]
testing_set = list_nltk[2000000:]
#applied Bag of words as a way to select and extract feature
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(training_set.split('\n'))
#apply tfd
tf_transformer = TfidfTransformer(use_idf=False).fit(X_train_counts)
X_train_tf = tf_transformer.transform(X_train_counts)
#Train the data
clf = MultinomialNB().fit(X_train_tf, training_set.split('\n'))
#now test the accuracy of the naive bayes classifier
test_data_features = count_vect.transform(testing_set)
X_new_tfidf = tf_transformer.transform(test_data_features)

predicted = clf.predict(X_new_tfidf)print "%.3f" % nltk.classify.accuracy(clf, predicted)


The problem is when I print the nltk.classify.accuracy, it takes forever 
and I am suspecting this is because I have done something wrong but since I 
get no error, I can't figure out what it is that is wrong. I would really 
appreciate any pointer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-image/attachments/20160416/cf7fc121/attachment.html>


More information about the scikit-image mailing list