[scikit-learn] Problem with nested cross-validation example?
Albert Thomas
albertthomas88 at gmail.com
Tue Nov 29 04:04:49 EST 2016
When I was reading Sebastian's blog posts on Cross Validation a few weeks
ago I also found the example of Nested cross validation on scikit-learn. At
first like Daniel I thought the example was not doing what it should be
doing. But after a few minutes I finally realized that it was correct. So I
am for a bit more clarification.
Albert
On Tue, 29 Nov 2016 at 02:53, Sebastian Raschka <se.raschka at gmail.com>
wrote:
> On first glance, the image shown in the image and the code example seem to
> do/show the same thing? Maybe it would be worth adding an explanatory
> figure like this to the docs to clarify?
>
> > On Nov 28, 2016, at 7:07 PM, Joel Nothman <joel.nothman at gmail.com>
> wrote:
> >
> > If that clarifies, please offer changes to the example (as a pull
> request) that make this clearer.
> >
> > On 29 November 2016 at 11:06, Joel Nothman <joel.nothman at gmail.com>
> wrote:
> > Briefly:
> >
> > clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv)
> > nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv)
> >
> > Each train/test split in cross_val_score holds out test data.
> GridSearchCV then splits each train set into (inner-)train and validation
> sets. There is no leakage of test set knowledge from the outer loop into
> the grid search optimisation; no leakage of validation set knowledge into
> the SVR optimisation. The outer test data are reused as training data, but
> within each split are only used to measure generalisation error.
> >
> > Is that clear?
> >
> > On 29 November 2016 at 10:30, Daniel Homola <dani.homola at gmail.com>
> wrote:
> > Dear all,
> >
> > I was wondering if the following example code is valid:
> >
> http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html
> >
> > My understanding is, that the point of nested cross-validation is to
> prevent any data leakage from the inner grid-search/param optimization CV
> loop into the outer model evaluation CV loop. This could be achieved if the
> outer CV loop's test data is completely separated from the inner loop's CV,
> as shown here:
> >
> https://mlr-org.github.io/mlr-tutorial/release/html/img/nested_resampling.png
> >
> > The code in the above example however doesn't seem to achieve this in
> any way.
> >
> > Am I missing something here?
> >
> > Thanks a lot,
> > dh
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161129/55d4adb5/attachment-0001.html>
More information about the scikit-learn
mailing list