[scikit-learn] How is linear regression in scikit-learn done? Do you need train and test split?
C W
tmrsg11 at gmail.com
Wed Jun 12 14:36:42 EDT 2019
Thank you both for the papers references.
@ Andreas,
What is your take? And what are you implying?
The Breiman (2001) paper points out the black box vs. statistical approach.
I call them black box vs. open box. He advocates black box in the paper.
Black box:
y <--- nature <--- x
Open box:
y <--- linear regression <---- x
Decision trees and neural nets are black box model. They require large
amount of data to train, and skip the part where it tries to understand
nature.
Because it is a black box, you can't open up to see what's inside. Linear
regression is a very simple model that you can use to approximate nature,
but the key thing is that you need to know how the data are generated.
@ Brown,
I know nothing about molecular modeling. The paper your linked "Beware of
q2!" paper raises some interesting point, as far as I see in sklearn linear
regression, score is R^2.
On Wed, Jun 5, 2019 at 9:11 AM Andreas Mueller <t3kcit at gmail.com> wrote:
>
> On 6/4/19 8:44 PM, C W wrote:
> > Thank you all for the replies.
> >
> > I agree that prediction accuracy is great for evaluating black-box ML
> > models. Especially advanced models like neural networks, or
> > not-so-black models like LASSO, because they are NP-hard to solve.
> >
> > Linear regression is not a black-box. I view prediction accuracy as an
> > overkill on interpretable models. Especially when you can use
> > R-squared, coefficient significance, etc.
> >
> > Prediction accuracy also does not tell you which feature is important.
> >
> > What do you guys think? Thank you!
> >
> Did you read the paper that I sent? ;)
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190612/f57b3b77/attachment.html>
More information about the scikit-learn
mailing list