[scikit-learn] Why is cross_val_predict discouraged?

Wed Apr 3 06:28:22 EDT 2019

I use

sum((cross_val_predict(model, X, y) - y)**2) / len(y)        (*)

to evaluate the performance of a model. This conforms with Murphy: 
Machine Learning, section 6.5.3, and Hastie et al: The Elements of 
Statistical Learning,  eq. 7.48. However, according to the documentation 
of cross_val_predict, "it is not appropriate to pass these predictions 
into an evaluation metric". While it is obvious that cross_val_predict 
is different from cross_val_score, I don't see what should be wrong with 
(*).

Also, the explanation that "|cross_val_predict| 
<https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_predict.html#sklearn.model_selection.cross_val_predict>simply 
returns the labels (or probabilities)" is unclear, if not wrong. As I 
understand it, this function returns estimates and no labels or 
probabilities.

Regards, Boris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190403/777aae79/attachment.html>