[scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
Samir K Mahajan
samirkmahajan1972 at gmail.com
Sat Aug 14 02:17:01 EDT 2021
Dear Chrisophe,
I think you are oversimplifying by saying econometrics tools are for
inference. Forecasting and prediction are integral parts of econometric
analysis. Econometricians forecast by inferring the right conclusion
about the model . I wish to convey to you that I teach both
statistics and econometrics, and am now learning ML. There is a
fundamental difference among statistics, econometrics and machine
learning.
Regards,
Samir K Mahajan
On Fri, Aug 13, 2021 at 3:39 PM Christophe Pallier <christophe at pallier.org>
wrote:
> Indeed , this is basically what I told you (you do not be need to copy
> textbook stuff: I taught probas/stats) : these are mostly problems for
> *inference*.
>
> On Fri, 13 Aug 2021, 12:03 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
> wrote:
>
>>
>> Dear Christophe Pallier*,*
>>
>> When we are doing prediction, we are relying on the values of the
>> coefficients of the model created. We are feeding test data on the model
>> for prediction. We may be nterested to see if the OLS
>> estimators(coefficients) are BLUE or not. In the presence of
>> autocorrelation (normally noticed in time series data), residuals are not
>> independent, and as such the OLS estimators are not BLUE in the sense that
>> they don't have minimum variance, and thus no more efficient estimators.
>> Statistical tests (t, F and *χ*2) may not be valid. We may reject the
>> model to make predictions in such a situation. . We have to rely upon
>> other improved models. There may be issues relating to multicollinearity
>> (in case of multivariable regression model) and heteroscedasticity (mostly
>> seen in cross-section data) too in a model. Can we discard these tools
>> while predicting a model?
>>
>> Regards,
>>
>> Samir K Mahajan
>>
>>
>> On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier <
>> christophe at pallier.org> wrote:
>>
>>> Actually, multicollinearity and autocorrelation are problems for
>>> *inference* more than for *prediction*. For example, if there is
>>> autocorrelation, the residuals are not independent, and the degrees of
>>> freedom are wrong for the tests in an OLS model (but you can use, e.g., an
>>> AR1 model).
>>>
>>> On Thu, 12 Aug 2021, 22:32 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
>>> wrote:
>>>
>>>> A note please (to Sebastian Raschka, mrschots).
>>>>
>>>>
>>>> The OLS model that I used ( where the test score gave me a negative
>>>> value) was not a good fit. Initial findings showed that t*he
>>>> regression coefficients and the model as a whole were significant, *yet
>>>> , finally , it failed in two econometrics tests such as VIF (used for
>>>> detecting multicollinearity ) and Durbin-Watson test ( used for detecting
>>>> auto-correlation). *Presence of multicollinearity and autocorrelation
>>>> problems * in the model make it unsuitable for prediction.
>>>> Regards,
>>>>
>>>> Samir K Mahajan.
>>>>
>>>> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan <
>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>
>>>>> Thanks to all of you for your kind response. Indeed, it is a
>>>>> great learning experience. Yes, econometrics books too create models for
>>>>> prediction, and programming really makes things better in a complex
>>>>> world. My understanding is that machine learning does depend on
>>>>> econometrics too.
>>>>>
>>>>> My Regards,
>>>>>
>>>>> Samir K Mahajan
>>>>>
>>>>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <
>>>>> mail at sebastianraschka.com> wrote:
>>>>>
>>>>>> The R2 function in scikit-learn works fine. A negative means that the
>>>>>> regression model fits the data worse than a horizontal line representing
>>>>>> the sample mean. E.g. you usually get that if you are overfitting the
>>>>>> training set a lot and then apply that model to the test set. The
>>>>>> econometrics book probably didn't cover applying a model to an independent
>>>>>> data or test set, hence the [0, 1] suggestion.
>>>>>>
>>>>>> Cheers,
>>>>>> Sebastian
>>>>>>
>>>>>>
>>>>>> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan <
>>>>>> samirkmahajan1972 at gmail.com>, wrote:
>>>>>>
>>>>>>
>>>>>> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas,
>>>>>> Thank you for your kind response. Fair enough. I go with you R2 is
>>>>>> not a square. However, if you open any book of econometrics, it says R2
>>>>>> is a ratio that lies between 0 and 1. *This is the constraint.*
>>>>>> It measures the proportion or percentage of the total variation in
>>>>>> response variable (Y) explained by the regressors (Xs) in the model .
>>>>>> Remaining proportion of variation in Y, if any, is explained by the
>>>>>> residual term(u) Now, sklearn.matrics. metrics.r2_score gives me a negative
>>>>>> value lying on a linear scale (-5.763335245921777). This negative
>>>>>> value breaks the *constraint.* I just want to highlight that. I
>>>>>> think it needs to be corrected. Rest is up to you .
>>>>>>
>>>>>> I find that Reshama Saikh is hurt by my email. I am really sorry
>>>>>> for that. Please note I never undermine your capabilities and initiatives.
>>>>>> You are great people doing great jobs. I realise that I should have been
>>>>>> more sensible.
>>>>>>
>>>>>> My regards to all of you.
>>>>>>
>>>>>> Samir K Mahajan
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
>>>>>> christophe at pallier.org> wrote:
>>>>>>
>>>>>>> Simple: despite its name R2 is not a square. Look up its definition.
>>>>>>>
>>>>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <
>>>>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Dear All,
>>>>>>>> I am amazed to find negative values of sklearn.metrics.r2_score
>>>>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation
>>>>>>>> of OLS regression model)
>>>>>>>> However, what amuses me more is seeing you justifying negative
>>>>>>>> 'sklearn.metrics.r2_score ' in your documentation. This does not
>>>>>>>> make sense to me . Please justify to me how squared values are negative.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Samir K Mahajan.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> scikit-learn mailing list
>>>>>>>> scikit-learn at python.org
>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikit-learn mailing list
>>>>>>> scikit-learn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210814/152511cc/attachment-0001.html>
More information about the scikit-learn
mailing list