A note please (to Sebastian Raschka, mrschots). 


  The OLS model  that I used  ( where the test score gave me a negative value)  was not a good fit.  Initial findings showed that the regression coefficients and  the model as a whole were significant,    yet ,  finally  ,  it failed in two econometrics tests  such as VIF (used for detecting multicollinearity ) and Durbin-Watson test  ( used for detecting auto-correlation).  Presence of multicollinearity and autocorrelation problems  in the model make it unsuitable for prediction. 
Regards, 

Samir K Mahajan. 

On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan <samirkmahajan1972@gmail.com> wrote:
Thanks  to all of you for your kind response.   Indeed, it  is a great learning experience.  Yes, econometrics books  too create models for prediction, and programming  really   makes things better in a complex world.   My understanding is that machine learning does depend on  econometrics  too.  

My Regards, 

Samir K Mahajan 

On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <mail@sebastianraschka.com> wrote:
The R2 function in scikit-learn works fine. A negative means that the regression model fits the data worse than a horizontal line representing the sample mean. E.g. you usually get that if you are overfitting the training set a lot and then apply that model to the test set. The econometrics book probably didn't cover applying a model to an independent data or test set, hence the [0, 1] suggestion.

Cheers,
Sebastian


On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan <samirkmahajan1972@gmail.com>, wrote:

Dear Christophe Pallier,  Reshama Saikh and Tromek Drabas, 


Thank you for your kind response.  Fair enough. I go with you R2 is not a square.  However, if you open any  book of econometrics,  it says R2 is  a ratio that lies between 0  and 1.  This is the constraint. It measures the proportion or percentage of the total variation in  response variable (Y)  explained by the regressors (Xs) in the model . Remaining proportion of variation in Y, if any,  is explained by the residual term(u) Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a linear scale (-5.763335245921777). This negative value breaks the constraint. I just want to highlight that. I think it needs to be corrected. Rest is up to you .

I find that  Reshama Saikh  is hurt by my email. I am really sorry for that. Please note I never undermine your  capabilities and initiatives. You are great people doing great jobs. I realise that I should have been more sensible. 

My regards to all of you.

Samir K Mahajan 








On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <christophe@pallier.org> wrote:
Simple: despite its name R2 is not a square. Look up its definition.

On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <samirkmahajan1972@gmail.com> wrote:
Dear All,
I am amazed to find  negative  values of  sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score in a model ( cross validation of OLS regression model) 
However, what amuses me more  is seeing you justifying   negative  'sklearn.metrics.r2_score ' in your documentation.  This does not make sense to me . Please justify to me how squared values are negative. 

Regards,
Samir K Mahajan. 

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn