[scikit-learn] Difference in prediction accuracy using SGDClassifier and Cross validation scores.

Rajnish kamboj rajnishk7.info at gmail.com
Sat Mar 9 13:34:57 EST 2019


Hi

I have recently started machine learning and it is my first query regarding
prediction accuracy.

There is difference in prediction accuracy using SGDClassifier and Cross
validation scores.

import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.linear_model import SGDClassifier

mnist = fetch_openml('mnist_784', version=1, cache=True)
X, y = mnist['data'], mnist['target']
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000],
y[60000:]
shuffled_index = np.random.permutation(60000) # shuffle the 0 - 60000 range
X_train, y_train = X_train[shuffled_index], y_train[shuffled_index]

y_train_5 = (y_train == '5')
y_test_5 = (y_test == '5')

sgd_clf = SGDClassifier(random_state=42, tol=1e-3, max_iter=1000)
sgd_clf.fit(X_train, y_train_5)

# Predicting for all 5s
print("####### PREDICTION STATS ##############")
y_train_5_pred = sgd_clf.predict(X_train)

print("Total y_train_5 [False|True both]]:", len(y_train_5))
print("Total y_train_5 [Only 5s]:", sum(y_train_5))

# some other digit may be predicted as 5 and some 5s may be predicted as
not 5
print("Predicted 5s:", sum(y_train_5_pred))

correctly_predicted = sum(np.logical_and(y_train_5_pred, y_train_5))
print("Correct Predicted", correctly_predicted)
print("Accuracy:", correctly_predicted/sum(y_train_5) * 100)

from sklearn.model_selection import cross_val_score
cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring='accuracy')

*MY Output*

####### PREDICTION STATS ##############
Total y_train_5 [False|True both]]: 60000
Total y_train_5 [Only 5s]: 5421
Predicted 5s: 3863
Correct Predicted 3574*Accuracy: 65.9287954251983*
array([*0.9323 , 0.96805, 0.9641* ])
#######################################

So as per my observation there is a difference, why?

SGDCLassifier is *~65.92%* accurate
cross_val_score are *~95%*

Am I comparing it in wrong way? OR I am missing something?


Thanks

Rajnish
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190310/943ebf87/attachment.html>


More information about the scikit-learn mailing list