Evaluating the predictive accuracy of the NB model
potato_head
kanohen at gmail.com
Sat Apr 16 13:05:17 EDT 2016
Please guys,
What am I doing wrong with using scikitlearn from nltk to check the
accuracy of the naive bayes classifier?
...readFile definition not needed #divide the data into training and testing sets
data = readFile('Data_test/')
training_set = list_nltk[:2000000]
testing_set = list_nltk[2000000:]
#applied Bag of words as a way to select and extract feature
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(training_set.split('\n'))
#apply tfd
tf_transformer = TfidfTransformer(use_idf=False).fit(X_train_counts)
X_train_tf = tf_transformer.transform(X_train_counts)
#Train the data
clf = MultinomialNB().fit(X_train_tf, training_set.split('\n'))
#now test the accuracy of the naive bayes classifier
test_data_features = count_vect.transform(testing_set)
X_new_tfidf = tf_transformer.transform(test_data_features)
predicted = clf.predict(X_new_tfidf)print "%.3f" % nltk.classify.accuracy(clf, predicted)
The problem is when I print the nltk.classify.accuracy, it takes forever
and I am suspecting this is because I have done something wrong but since I
get no error, I can't figure out what it is that is wrong. I would really
appreciate any pointer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-image/attachments/20160416/cf7fc121/attachment.html>
More information about the scikit-image
mailing list