[scikit-learn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score

Samir K Mahajan samirkmahajan1972 at gmail.com
Fri Aug 13 06:02:55 EDT 2021


Dear Christophe Pallier*,*

When we are doing prediction, we are relying on the values of the
coefficients of the model created. We are feeding test data on the model
for prediction.    We may be nterested to see if the OLS
estimators(coefficients)  are BLUE or not. In the presence of
autocorrelation (normally noticed in time series data),  residuals are not
independent, and as such the OLS estimators are not BLUE in the sense that
they don't have minimum variance, and thus no more efficient estimators.
Statistical tests (t, F and *χ*2)  may not be valid.  We may reject the
model to make predictions in such a situation.  .   We have to rely upon
other improved models.   There may be issues relating to multicollinearity
(in case of multivariable regression model)  and heteroscedasticity (mostly
seen  in cross-section data) too in a model.  Can we discard these  tools
while predicting a model?

Regards,

Samir K Mahajan


On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier <christophe at pallier.org>
wrote:

> Actually, multicollinearity and autocorrelation are problems for
> *inference* more than for *prediction*. For example, if there is
> autocorrelation, the residuals are not independent, and the degrees of
> freedom are wrong for the tests in an OLS model (but you can use, e.g., an
> AR1 model).
>
> On Thu, 12 Aug 2021, 22:32 Samir K Mahajan, <samirkmahajan1972 at gmail.com>
> wrote:
>
>> A note please (to Sebastian Raschka, mrschots).
>>
>>
>>   The OLS model  that I used  ( where the test score gave me a negative
>> value)  was not a good fit.  Initial findings showed that t*he
>> regression coefficients and  the model as a whole were significant,    *yet
>> ,  finally  ,  it failed in two econometrics tests  such as VIF (used for
>> detecting multicollinearity ) and Durbin-Watson test  ( used for detecting
>> auto-correlation).  *Presence of multicollinearity and autocorrelation
>> problems * in the model make it unsuitable for prediction.
>> Regards,
>>
>> Samir K Mahajan.
>>
>> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan <
>> samirkmahajan1972 at gmail.com> wrote:
>>
>>> Thanks  to all of you for your kind response.   Indeed, it  is a
>>> great learning experience.  Yes, econometrics books  too create models for
>>> prediction, and programming  really   makes things better in a complex
>>> world.   My understanding is that machine learning does depend on
>>> econometrics  too.
>>>
>>> My Regards,
>>>
>>> Samir K Mahajan
>>>
>>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <
>>> mail at sebastianraschka.com> wrote:
>>>
>>>> The R2 function in scikit-learn works fine. A negative means that the
>>>> regression model fits the data worse than a horizontal line representing
>>>> the sample mean. E.g. you usually get that if you are overfitting the
>>>> training set a lot and then apply that model to the test set. The
>>>> econometrics book probably didn't cover applying a model to an independent
>>>> data or test set, hence the [0, 1] suggestion.
>>>>
>>>> Cheers,
>>>> Sebastian
>>>>
>>>>
>>>> On Aug 12, 2021, 2:20 PM -0500, Samir K Mahajan <
>>>> samirkmahajan1972 at gmail.com>, wrote:
>>>>
>>>>
>>>> Dear Christophe Pallier,  Reshama Saikh and Tromek Drabas,
>>>> Thank you for your kind response.  Fair enough. I go with you R2 is
>>>> not a square.  However, if you open any  book of econometrics,  it says R2
>>>> is  a ratio that lies between 0  and 1.  *This is the constraint.* It
>>>> measures the proportion or percentage of the total variation in  response
>>>> variable (Y)  explained by the regressors (Xs) in the model . Remaining
>>>> proportion of variation in Y, if any,  is explained by the residual term(u)
>>>> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
>>>> linear scale (-5.763335245921777). This negative value breaks the
>>>> *constraint.* I just want to highlight that. I think it needs to be
>>>> corrected. Rest is up to you .
>>>>
>>>> I find that  Reshama Saikh  is hurt by my email. I am really sorry for
>>>> that. Please note I never undermine your  capabilities and initiatives. You
>>>> are great people doing great jobs. I realise that I should have been more
>>>> sensible.
>>>>
>>>> My regards to all of you.
>>>>
>>>> Samir K Mahajan
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
>>>> christophe at pallier.org> wrote:
>>>>
>>>>> Simple: despite its name R2 is not a square. Look up its definition.
>>>>>
>>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <
>>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>>
>>>>>> Dear All,
>>>>>> I am amazed to find  negative  values of  sklearn.metrics.r2_score
>>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation
>>>>>> of OLS regression model)
>>>>>> However, what amuses me more  is seeing you justifying   negative
>>>>>> 'sklearn.metrics.r2_score ' in your documentation.  This does not
>>>>>> make sense to me . Please justify to me how squared values are negative.
>>>>>>
>>>>>> Regards,
>>>>>> Samir K Mahajan.
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210813/491645aa/attachment.html>


More information about the scikit-learn mailing list