[scikit-learn] How is linear regression in scikit-learn done? Do you need train and test split?

Brown J.B. jbbrown at kuhp.kyoto-u.ac.jp
Tue Jun 4 21:43:09 EDT 2019


Dear CW,


> Linear regression is not a black-box. I view prediction accuracy as an
> overkill on interpretable models. Especially when you can use R-squared,
> coefficient significance, etc.
>

Following on my previous note about being cautious with cross-validated
evaluation for classification, the same applies for regression.
About 20 years ago, chemoinformatics researchers pointed out the caution
needed with using CV-based R^2 (q^2) as a measure of performance.
"Beware of q2!"  Golbraikh and Tropsha, J Mol Graph Modeling (2002) 20:269
https://www.sciencedirect.com/science/article/pii/S1093326301001231

In this article, they propose to measure correlation by using both
known-VS-predicted _and_ predicted-VS-known calculations of the correlation
coefficient, and importantly, that the regression line to fit in both cases
goes through the origin.
The resulting coefficients are checked as a pair, and the authors argue
that only if they are both high can one say that the model is fitting the
data well.

Contrast this to Pearson Product Moment Correlation (R), where the fit of
the line has no requirement to go through the origin of the fit.

I found the paper above to be helpful in filtering for more robust
regression models, and have implemented my own version of their method,
which I use as my first evaluation metric when performing regression
modelling.

Hope this provides you some thought.

Prediction accuracy also does not tell you which feature is important.
>

The contributions of the scikit-learn community have yielded a great set of
tools for performing feature weighting separate from model performance
evaluation.
All you need to do is read the documentation and try out some of the
examples, and you should be ready to adapt to your situation.

J.B.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190605/e484f0b5/attachment.html>


More information about the scikit-learn mailing list