[scikit-learn] creating a custom scoring function for cross-validation of classification

Tue Nov 1 12:52:35 EDT 2016

ahha - thanks Andy !
that works...

On Tue, Nov 1, 2016 at 7:05 AM, Andy <t3kcit at gmail.com> wrote:

> Hi.
> If you want to pass a custom scorer, you need to pass the scorer, not a
> string with the scorer name.
> Andy
>
>
> On 10/31/2016 04:28 PM, Sumeet Sandhu wrote:
>
> Hi,
>
> I've been staring at various doc pages for a while to create a custom
> scorer that uses predict_proba output of a multi-class SGDClassifier :
> http://scikit-learn.org/stable/modules/generated/
> sklearn.model_selection.cross_val_score.html#sklearn.model_
> selection.cross_val_score
> http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-
> parameter
> http://scikit-learn.org/stable/modules/generated/
> sklearn.metrics.make_scorer.html#sklearn.metrics.make_scorer
>
> I got the impression I could customize the "scoring'' parameter in
> cross_val_score directly, but that didn't work.
> Then I tried customizing the "score_func" parameter in make_scorer, but
> that didn't work either. Both errors are ValuErrors :
>
> Traceback (most recent call last):
>   File "<pyshell#96>", line 3, in <module>
>     accuracy = mean(cross_val_score(LRclassifier, trainPatentVecs,
> trainLabelVecs, cv=10, scoring = 'topNscorer'))
>   File "/Library/Frameworks/Python.framework/Versions/2.7/lib/
> python2.7/site-packages/sklearn/cross_validation.py", line 1425, in
> cross_val_score
>     scorer = check_scoring(estimator, scoring=scoring)
>   File "/Library/Frameworks/Python.framework/Versions/2.7/lib/
> python2.7/site-packages/sklearn/metrics/scorer.py", line 238, in
> check_scoring
>     return get_scorer(scoring)
>   File "/Library/Frameworks/Python.framework/Versions/2.7/lib/
> python2.7/site-packages/sklearn/metrics/scorer.py", line 197, in
> get_scorer
>     % (scoring, sorted(SCORERS.keys())))
> ValueError: 'topNscorer' is not a valid scoring value. Valid options are
> ['accuracy', 'adjusted_rand_score', 'average_precision', 'f1', 'f1_macro',
> 'f1_micro', 'f1_samples', 'f1_weighted', 'log_loss', 'mean_absolute_error',
> 'mean_squared_error', 'median_absolute_error', 'precision',
> 'precision_macro', 'precision_micro', 'precision_samples',
> 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro',
> 'recall_samples', 'recall_weighted', 'roc_auc']
>
> At a high level, I want to find out if the true label was found in the top
> N multi-class labels coming out of an SGD classifier. Built-in scores like
> "accuracy" only look at N=1.
>
> Here is the code using make_scorer :
>         LRclassifier = SGDClassifier(loss='log')
>         topNscorer = make_scorer(topNscoring, greater_is_better=True,
> needs_proba=True)
>         accuracyN = mean(cross_val_score(LRclassifier, Data, Labels,
> scoring = 'topNscorer'))
>
> Here is the code for the custom scoring function :
> def topNscoring(y, yp):
>     ## Inputs y = true label per sample, yp = predict_proba probabilities
> of all labels per sample
>     N = 5
>     foundN = []
>     for ii in xrange(0,shape(yp)[0]):
>         indN = [ w[0] for w in sorted(enumerate(list(yp[ii,:])),key=lambda
> w:w[1],reverse=True)[0:N] ]
>         if y[ii] in indN:             foundN.append(1)
>         else:             foundN.append(0)
>     return mean(foundN)
>
> Any help will be greatly appreciated.
>
> best regards,
> Sumeet
>
>
>
>
> _______________________________________________
> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161101/330d7137/attachment.html>