[scikit-learn] Scores in Cross Validation

Raga Markely raga.markely at gmail.com
Thu Jan 26 13:19:39 EST 2017


Thank you, Guillaume.

1. I agree with you - that's what I have been learning and makes sense.. I
was a bit surprised when I read the paper today..

2. Ah.. thank you.. I got to change my glasses :P

Best,
Raga

*Guillaume Lemaître* g.lemaitre58 at gmail.com
<scikit-learn%40python.org?Subject=Re%3A%20%5Bscikit-learn%5D%20Scores%20in%20Cross%20Validation&In-Reply-To=%3CCACDxx9iWD-askcfMoS3doe-AHd3QbZd5EofAJx1qp%3DUAAuVM2Q%40mail.gmail.com%3E>
*Thu Jan 26 12:05:12 EST 2017*


   - Previous message (by thread): [scikit-learn] Scores in Cross Validation
   <https://mail.python.org/pipermail/scikit-learn/2017-January/001145.html>
   - *Messages sorted by:* [ date ]
   <https://mail.python.org/pipermail/scikit-learn/2017-January/date.html#1146>
    [ thread ]
   <https://mail.python.org/pipermail/scikit-learn/2017-January/thread.html#1146>
    [ subject ]
   <https://mail.python.org/pipermail/scikit-learn/2017-January/subject.html#1146>
    [ author ]
   <https://mail.python.org/pipermail/scikit-learn/2017-January/author.html#1146>

------------------------------

1. You should not evaluate an estimator on the data which have been used to
train it.
Usually, you try to minimize the classification or loss using those data
and fit them as
good as possible. Evaluating on an unseen testing set will give you an idea
how good
your estimator was able to generalize to your problem during the training.
Furthermore, a training, validation, and testing set should be used when
setting up
parameters. Validation will be used to set the parameters and the testing
will be used
to evaluate your best estimator.

That is why, when using the GridSearchCV, fit will train the estimator
using a training
and validation test (using a given CV startegies). Finally, predict will be
performed on
another unseen testing set.

The bottom line is that using training data to select parameters will not
ensure that you
are selecting the best parameters for your problems.

2. The function is call in _fit_and_score, l. 260 and 263 for instance.

On 26 January 2017 at 17:02, Raga Markely <raga.markely at gmail.com
<https://mail.python.org/mailman/listinfo/scikit-learn>> wrote:

>* Hello,
*>>* I have 2 questions regarding cross_val_score.
*>* 1. Do the scores returned by cross_val_score correspond to only the test
*>* set or the whole data set (training and test sets)?
*>* I tried to look at the source code, and it looks like it returns the score
*>* of only the test set (line 145: "return_train_score=False") - I am not sure
*>* if I am reading the codes properly, though..
*>* https://github.com/scikit-learn/scikit-learn/blob/14031f6/
<https://github.com/scikit-learn/scikit-learn/blob/14031f6/>
*>* sklearn/model_selection/_validation.py#L36
*>* I came across the paper below and the authors use the score of the whole
*>* dataset when the author performs repeated nested loop, grid search cv,
*>* etc.. e.g. see algorithm 1 (line 1c) and 2 (line 2d) on page 3.
*>* https://jcheminf.springeropen.com/articles/10.1186/1758-2946-6-10
<https://jcheminf.springeropen.com/articles/10.1186/1758-2946-6-10>
*>* I wonder what's the pros and cons of using the accuracy score of the whole
*>* dataset vs just the test set.. any thoughts?
*>>* 2. On line 283 of the cross_val_score source code, there is a function
*>* _score. However, I can't find where this function is called. Could you let
*>* me know where this function is called?
*>>* Thank you very much!
*>* Raga
*>>* _______________________________________________
*>* scikit-learn mailing list
*>* scikit-learn at python.org
<https://mail.python.org/mailman/listinfo/scikit-learn>
*>* https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
*>>

-- 
Guillaume Lemaitre
INRIA Saclay - Ile-de-France
Equipe PARIETALguillaume.lemaitre at inria.f
<https://mail.python.org/mailman/listinfo/scikit-learn>
<guillaume.lemaitre at inria.fr
<https://mail.python.org/mailman/listinfo/scikit-learn>>r
---https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170126/730fe0aa/attachment.html>


More information about the scikit-learn mailing list