[scikit-learn] Why is cross_val_predict discouraged?

Wed Apr 3 09:28:52 EDT 2019

On Wed, Apr 03, 2019 at 08:54:51AM -0400, Andreas Mueller wrote:
> If the loss decomposes, the result might be different b/c of different test
> set sizes, but I'm not sure if they are "worse" in some way?

Mathematically, a cross-validation estimates a double expectation: one
expectation on the model (ie the train data), and another on the test
data (see for instance eq 3 in
https://europepmc.org/articles/pmc5441396, sorry for the self citation,
this is seldom discussed in the literature).

The correct way to compute this double expectation is by averaging first
inside the fold and second across the folds. Other ways of computing
errors estimate other quantities, that are harder to study mathematically
and not comparable to objects studied in the literature.

Another problem with cross_val_predict is that some people use metrics
like correlation (which is a terrible metric and does not decompose
across folds). It will then pick up things like correlations across
folds.

All these problems are made worse when data are not iid, and hence folds
risk not being iid.

G