creating a custom scoring function for cross-validation of classification
Hi, I've been staring at various doc pages for a while to create a custom scorer that uses predict_proba output of a multi-class SGDClassifier : http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cro... http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-paramet... http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer... I got the impression I could customize the "scoring'' parameter in cross_val_score directly, but that didn't work. Then I tried customizing the "score_func" parameter in make_scorer, but that didn't work either. Both errors are ValuErrors : Traceback (most recent call last): File "<pyshell#96>", line 3, in <module> accuracy = mean(cross_val_score(LRclassifier, trainPatentVecs, trainLabelVecs, cv=10, scoring = 'topNscorer')) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1425, in cross_val_score scorer = check_scoring(estimator, scoring=scoring) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/metrics/scorer.py", line 238, in check_scoring return get_scorer(scoring) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/metrics/scorer.py", line 197, in get_scorer % (scoring, sorted(SCORERS.keys()))) ValueError: 'topNscorer' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_rand_score', 'average_precision', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'log_loss', 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc'] At a high level, I want to find out if the true label was found in the top N multi-class labels coming out of an SGD classifier. Built-in scores like "accuracy" only look at N=1. Here is the code using make_scorer : LRclassifier = SGDClassifier(loss='log') topNscorer = make_scorer(topNscoring, greater_is_better=True, needs_proba=True) accuracyN = mean(cross_val_score(LRclassifier, Data, Labels, scoring = 'topNscorer')) Here is the code for the custom scoring function : def topNscoring(y, yp): ## Inputs y = true label per sample, yp = predict_proba probabilities of all labels per sample N = 5 foundN = [] for ii in xrange(0,shape(yp)[0]): indN = [ w[0] for w in sorted(enumerate(list(yp[ii,:])),key=lambda w:w[1],reverse=True)[0:N] ] if y[ii] in indN: foundN.append(1) else: foundN.append(0) return mean(foundN) Any help will be greatly appreciated. best regards, Sumeet
Hi. If you want to pass a custom scorer, you need to pass the scorer, not a string with the scorer name. Andy On 10/31/2016 04:28 PM, Sumeet Sandhu wrote:
Hi,
I've been staring at various doc pages for a while to create a custom scorer that uses predict_proba output of a multi-class SGDClassifier : http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cro... http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-paramet... http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer...
I got the impression I could customize the "scoring'' parameter in cross_val_score directly, but that didn't work. Then I tried customizing the "score_func" parameter in make_scorer, but that didn't work either. Both errors are ValuErrors :
Traceback (most recent call last): File "<pyshell#96>", line 3, in <module> accuracy = mean(cross_val_score(LRclassifier, trainPatentVecs, trainLabelVecs, cv=10, scoring = 'topNscorer')) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1425, in cross_val_score scorer = check_scoring(estimator, scoring=scoring) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/metrics/scorer.py", line 238, in check_scoring return get_scorer(scoring) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/metrics/scorer.py", line 197, in get_scorer % (scoring, sorted(SCORERS.keys()))) ValueError: 'topNscorer' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_rand_score', 'average_precision', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'log_loss', 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc']
At a high level, I want to find out if the true label was found in the top N multi-class labels coming out of an SGD classifier. Built-in scores like "accuracy" only look at N=1.
Here is the code using make_scorer : LRclassifier = SGDClassifier(loss='log') topNscorer = make_scorer(topNscoring, greater_is_better=True, needs_proba=True) accuracyN = mean(cross_val_score(LRclassifier, Data, Labels, scoring = 'topNscorer'))
Here is the code for the custom scoring function : def topNscoring(y, yp): ## Inputs y = true label per sample, yp = predict_proba probabilities of all labels per sample N = 5 foundN = [] for ii in xrange(0,shape(yp)[0]): indN = [ w[0] for w in sorted(enumerate(list(yp[ii,:])),key=lambda w:w[1],reverse=True)[0:N] ] if y[ii] in indN: foundN.append(1) else: foundN.append(0) return mean(foundN)
Any help will be greatly appreciated.
best regards, Sumeet
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
ahha - thanks Andy ! that works... On Tue, Nov 1, 2016 at 7:05 AM, Andy <t3kcit@gmail.com> wrote:
Hi. If you want to pass a custom scorer, you need to pass the scorer, not a string with the scorer name. Andy
On 10/31/2016 04:28 PM, Sumeet Sandhu wrote:
Hi,
I've been staring at various doc pages for a while to create a custom scorer that uses predict_proba output of a multi-class SGDClassifier : http://scikit-learn.org/stable/modules/generated/ sklearn.model_selection.cross_val_score.html#sklearn.model_ selection.cross_val_score http://scikit-learn.org/stable/modules/model_evaluation.html#scoring- parameter http://scikit-learn.org/stable/modules/generated/ sklearn.metrics.make_scorer.html#sklearn.metrics.make_scorer
I got the impression I could customize the "scoring'' parameter in cross_val_score directly, but that didn't work. Then I tried customizing the "score_func" parameter in make_scorer, but that didn't work either. Both errors are ValuErrors :
Traceback (most recent call last): File "<pyshell#96>", line 3, in <module> accuracy = mean(cross_val_score(LRclassifier, trainPatentVecs, trainLabelVecs, cv=10, scoring = 'topNscorer')) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/ python2.7/site-packages/sklearn/cross_validation.py", line 1425, in cross_val_score scorer = check_scoring(estimator, scoring=scoring) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/ python2.7/site-packages/sklearn/metrics/scorer.py", line 238, in check_scoring return get_scorer(scoring) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/ python2.7/site-packages/sklearn/metrics/scorer.py", line 197, in get_scorer % (scoring, sorted(SCORERS.keys()))) ValueError: 'topNscorer' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_rand_score', 'average_precision', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'log_loss', 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc']
At a high level, I want to find out if the true label was found in the top N multi-class labels coming out of an SGD classifier. Built-in scores like "accuracy" only look at N=1.
Here is the code using make_scorer : LRclassifier = SGDClassifier(loss='log') topNscorer = make_scorer(topNscoring, greater_is_better=True, needs_proba=True) accuracyN = mean(cross_val_score(LRclassifier, Data, Labels, scoring = 'topNscorer'))
Here is the code for the custom scoring function : def topNscoring(y, yp): ## Inputs y = true label per sample, yp = predict_proba probabilities of all labels per sample N = 5 foundN = [] for ii in xrange(0,shape(yp)[0]): indN = [ w[0] for w in sorted(enumerate(list(yp[ii,:])),key=lambda w:w[1],reverse=True)[0:N] ] if y[ii] in indN: foundN.append(1) else: foundN.append(0) return mean(foundN)
Any help will be greatly appreciated.
best regards, Sumeet
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (2)
-
Andy -
Sumeet Sandhu