[scikit-learn] How is linear regression in scikit-learn done? Do you need train and test split?

C W tmrsg11 at gmail.com
Sat Jun 1 22:42:14 EDT 2019


Hi Nicholas,

I don't get it.

The coefficients are estimated through OLS. Essentially, you are just
calculating a matrix pseudo inverse, where
beta = (X^T * X)^(-1) * X^T * y

Splitting the data does not improve the model, It only works in something
like LASSO, where you have a tuning parameter.

Holding out some data will make the regression estimates worse off.

Hope to hear from you, thanks!



On Sat, Jun 1, 2019 at 10:04 AM Nicolas Hug <niourf at gmail.com> wrote:

> Splitting the data into train and test data is needed with any machine
> learning model (not just linear regression with or without least squares).
>
> The idea is that you want to evaluate the performance of your model
> (prediction + scoring) on a portion of the data that you did not use for
> training.
>
> You'll find more details in the user guide
> https://scikit-learn.org/stable/modules/cross_validation.html
>
> Nicolas
>
>
> On 5/31/19 8:54 PM, C W wrote:
>
> Hello everyone,
>
> I'm new to scikit learn. I see that many tutorial in scikit-learn follows
> the work-flow along the lines of
> 1) tranform the data
> 2) split the data: train, test
> 3) instantiate the sklearn object and fit
> 4) predict and tune parameter
>
> But, linear regression is done in least squares, so I don't think train
> test split is necessary. So, I guess I can just use the entire dataset?
>
> Thanks in advance!
>
> _______________________________________________
> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190601/532b1ffa/attachment.html>


More information about the scikit-learn mailing list