Evaluating the predictive accuracy of the NB model
Please guys, What am I doing wrong with using scikitlearn from nltk to check the accuracy of the naive bayes classifier? ...readFile definition not needed #divide the data into training and testing sets data = readFile('Data_test/') training_set = list_nltk[:2000000] testing_set = list_nltk[2000000:] #applied Bag of words as a way to select and extract feature count_vect = CountVectorizer() X_train_counts = count_vect.fit_transform(training_set.split('\n')) #apply tfd tf_transformer = TfidfTransformer(use_idf=False).fit(X_train_counts) X_train_tf = tf_transformer.transform(X_train_counts) #Train the data clf = MultinomialNB().fit(X_train_tf, training_set.split('\n')) #now test the accuracy of the naive bayes classifier test_data_features = count_vect.transform(testing_set) X_new_tfidf = tf_transformer.transform(test_data_features) predicted = clf.predict(X_new_tfidf)print "%.3f" % nltk.classify.accuracy(clf, predicted) The problem is when I print the nltk.classify.accuracy, it takes forever and I am suspecting this is because I have done something wrong but since I get no error, I can't figure out what it is that is wrong. I would really appreciate any pointer.
Sorry, wrong mailing list. On 16 April 2016 at 10:05, potato_head <kanohen@gmail.com> wrote:
Please guys,
What am I doing wrong with using scikitlearn from nltk to check the accuracy of the naive bayes classifier?
...readFile definition not needed #divide the data into training and testing sets data = readFile('Data_test/') training_set = list_nltk[:2000000] testing_set = list_nltk[2000000:]
#applied Bag of words as a way to select and extract feature count_vect = CountVectorizer() X_train_counts = count_vect.fit_transform(training_set.split('\n'))
#apply tfd tf_transformer = TfidfTransformer(use_idf=False).fit(X_train_counts) X_train_tf = tf_transformer.transform(X_train_counts)
#Train the data clf = MultinomialNB().fit(X_train_tf, training_set.split('\n'))
#now test the accuracy of the naive bayes classifier test_data_features = count_vect.transform(testing_set) X_new_tfidf = tf_transformer.transform(test_data_features)
predicted = clf.predict(X_new_tfidf) print "%.3f" % nltk.classify.accuracy(clf, predicted)
The problem is when I print the nltk.classify.accuracy, it takes forever and I am suspecting this is because I have done something wrong but since I get no error, I can't figure out what it is that is wrong. I would really appreciate any pointer.
-- You received this message because you are subscribed to the Google Groups "scikit-image" group. To unsubscribe from this group and stop receiving emails from it, send an email to scikit-image+unsubscribe@googlegroups.com. To post to this group, send email to scikit-image@googlegroups.com. To view this discussion on the web, visit https://groups.google.com/d/msgid/scikit-image/acb93ac9-a262-40f0-ac4d-3d710.... For more options, visit https://groups.google.com/d/optout.
participants (2)
-
potato_head -
Stéfan van der Walt