[scikit-learn] creating a custom scoring function for cross-validation of classification

Tue Nov 1 10:05:54 EDT 2016

Hi.
If you want to pass a custom scorer, you need to pass the scorer, not a 
string with the scorer name.
Andy

On 10/31/2016 04:28 PM, Sumeet Sandhu wrote:
> Hi,
>
> I've been staring at various doc pages for a while to create a custom 
> scorer that uses predict_proba output of a multi-class SGDClassifier :
> http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score
> http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
> http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html#sklearn.metrics.make_scorer
>
> I got the impression I could customize the "scoring'' parameter in 
> cross_val_score directly, but that didn't work.
> Then I tried customizing the "score_func" parameter in make_scorer, 
> but that didn't work either. Both errors are ValuErrors :
>
> Traceback (most recent call last):
>   File "<pyshell#96>", line 3, in <module>
>     accuracy = mean(cross_val_score(LRclassifier, trainPatentVecs, 
> trainLabelVecs, cv=10, scoring = 'topNscorer'))
>   File 
> "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/cross_validation.py", 
> line 1425, in cross_val_score
>     scorer = check_scoring(estimator, scoring=scoring)
>   File 
> "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/metrics/scorer.py", 
> line 238, in check_scoring
>     return get_scorer(scoring)
>   File 
> "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/metrics/scorer.py", 
> line 197, in get_scorer
>     % (scoring, sorted(SCORERS.keys())))
> ValueError: 'topNscorer' is not a valid scoring value. Valid options 
> are ['accuracy', 'adjusted_rand_score', 'average_precision', 'f1', 
> 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'log_loss', 
> 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 
> 'precision', 'precision_macro', 'precision_micro', 
> 'precision_samples', 'precision_weighted', 'r2', 'recall', 
> 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 
> 'roc_auc']
>
> At a high level, I want to find out if the true label was found in the 
> top N multi-class labels coming out of an SGD classifier. Built-in 
> scores like "accuracy" only look at N=1.
>
> Here is the code using make_scorer :
>         LRclassifier = SGDClassifier(loss='log')
>         topNscorer = make_scorer(topNscoring, greater_is_better=True, 
> needs_proba=True)
>         accuracyN = mean(cross_val_score(LRclassifier, Data, Labels, 
> scoring = 'topNscorer'))
>
> Here is the code for the custom scoring function :
> def topNscoring(y, yp):
>     ## Inputs y = true label per sample, yp = predict_proba 
> probabilities of all labels per sample
>     N = 5
>     foundN = []
>     for ii in xrange(0,shape(yp)[0]):
>         indN = [ w[0] for w in 
> sorted(enumerate(list(yp[ii,:])),key=lambda w:w[1],reverse=True)[0:N] ]
>         if y[ii] in indN: foundN.append(1)
>         else:             foundN.append(0)
>     return mean(foundN)
>
> Any help will be greatly appreciated.
>
> best regards,
> Sumeet
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161101/e0e1199a/attachment-0001.html>