From uros.pocek at gmail.com Mon Aug 2 05:03:55 2021
From: uros.pocek at gmail.com (=?UTF8?B?VXJvxaEgUG/EjWVr?=)
Date: Mon, 2 Aug 2021 11:03:55 +0200
Subject: [scikitlearn] scikitlearn for Apple Silicon M1 Macs
MessageID:
Hello, I am a student and ML programmer and I have been using scikitlearn
library for python for a few years now on my PC, but recently I switched to
M1 iMac and when I tried to transfer my projects and pip install used
libraries in them I ran in bunch of issues. Long story short I was able to
successfully install all ML libraries on my new Mac(tensorflow, numpy,
matplotlib, pandas, torch, ?) except scikitlearn (sklearn)! When can we
expect to see version of this library that can be installed using pip on M1
Macs and that can be used without any issues?
Thank you all in advance.
Uros Pocek
 next part 
An HTML attachment was scrubbed...
URL:
From rth.yurchak at gmail.com Mon Aug 2 05:15:48 2021
From: rth.yurchak at gmail.com (Roman Yurchak)
Date: Mon, 2 Aug 2021 11:15:48 +0200
Subject: [scikitlearn] [TC Vote] Technical Committee vote: line length
InReplyTo:
References: <20210726212619.54iy56wbl4sdbe3z@phare.normalesup.org>
MessageID: <482f3b2cfcff719baa446f3c2d4afc0b@gmail.com>
I also don't have a strong opinion on this, and generally I'm just happy
that black migration happened.
Still with a slight preference for 88 characters as the default.
On 28/07/2021 18:34, Olivier Grisel wrote:
> Many very active core devs not represented in the TC voted for 88 and
> my previous vote for 79 was not that strong. So I feel that I should
> now vote for 88:
>
> Keep current 88 characters:
>
> Olivier
>
> Revert to 79 characters:
>
From g.lemaitre58 at gmail.com Mon Aug 2 06:07:18 2021
From: g.lemaitre58 at gmail.com (=?utf8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Mon, 2 Aug 2021 12:07:18 +0200
Subject: [scikitlearn] scikitlearn for Apple Silicon M1 Macs
InReplyTo:
References:
MessageID:
There is no currently available wheel in PyPI because NumPy and SciPy does not provide wheels as well:
https://github.com/scikitlearn/scikitlearn/issues/19137
However, one can use `miniforge` or `mambaforge` to install binaries without the need to build from source:
https://scikitlearn.org/stable/install.html#installingonapplesiliconm1hardware
NB: I am currently developing scikitlearn with a M1 using `mambaforge` and the process is pretty smooth.

Guillaume Lemaitre
Scikitlearn @ Inria Foundation
https://glemaitre.github.io/
> On 2 Aug 2021, at 11:03, Uro? Po?ek wrote:
>
> Hello, I am a student and ML programmer and I have been using scikitlearn library for python for a few years now on my PC, but recently I switched to M1 iMac and when I tried to transfer my projects and pip install used libraries in them I ran in bunch of issues. Long story short I was able to successfully install all ML libraries on my new Mac(tensorflow, numpy, matplotlib, pandas, torch, ?) except scikitlearn (sklearn)! When can we expect to see version of this library that can be installed using pip on M1 Macs and that can be used without any issues?
>
> Thank you all in advance.
> Uros Pocek
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
 next part 
An HTML attachment was scrubbed...
URL:
From reshama.stat at gmail.com Thu Aug 5 10:38:18 2021
From: reshama.stat at gmail.com (Reshama Shaikh)
Date: Thu, 5 Aug 2021 10:38:18 0400
Subject: [scikitlearn] Open Source: sustainability and etiquette
InReplyTo:
References:
MessageID:
Hello,
I found the video, it's from 2017. It's by Heather Miller, a professor at
CMU. The 40minute talk is entitled: The Dramatic Consequences of the
Open Source Revolution [a]
Brigitta,
Heather references Nadia Eghbal's book in her talk, which I also added to
my list. [b]
Adrin,
I added CHAOSS to the list as well. They have a mailing list which I have
subscribed to.
[a] https://youtu.be/K4mVuxcimWk
[b] https://www.dataumbrella.org/opensource/opensourcesustainability
Reshama Shaikh
she/her
Blog  Twitter
 LinkedIn  GitHub
Data Umbrella
NYC PyLadies
On Mon, Apr 19, 2021 at 6:51 PM Brigitta Sipocz wrote:
> Hi,
>
> I've also very much liked Nadia Eghbal's book: Working in public; The
> making and maintenance of open source software. I haven't yet attended a
> conference where she was a speaker, but I'm certain there are some relevant
> recordings on youtube.
>
> Cheers,
> Brigitta
>
>
> On Mon, 19 Apr 2021 at 06:27, Adrin wrote:
>
>> This is a really good initiative Reshama, thanks for sharing.
>>
>> Have you seen CHAOSScon talks and activities? They're really good, and
>> touch on a lot of really good stuff when it comes to open source
>> communities and sustainability.
>> Eg.: https://chaoss.community/chaosscon2020eu/
>>
>> Cheers,
>> Adrin
>>
>> On Fri, Apr 16, 2021 at 4:26 PM Reshama Shaikh
>> wrote:
>>
>>> Hello,
>>> I've seen some excellent resources that have explained open source, its
>>> sustainability, challenges and *indirectly, the etiquette*.
>>>
>>> I am starting to compile the list here [a].
>>>
>>> This keynote by Stuart Geiger is a mustwatch: The Invisible Work of
>>> Maintaining & Sustaining Open Source Software [b]
>>>
>>> There is one more video by Emily someone who was at Microsoft, but is
>>> now a professor somewhere, and I am trying to track that video down. I
>>> think it's from 2017. I'll add it to the list once I find it. If anyone
>>> knows the full name of the speaker, please share.
>>>
>>> [a]
>>> https://www.dataumbrella.org/opensource/opensourcesustainability
>>>
>>> [b]
>>> https://www.youtube.com/watch?v=PM3iltcaIL8
>>>
>>> Best,
>>> Reshama
>>> 
>>> Reshama Shaikh
>>> she/her
>>> Blog  Twitter
>>>  LinkedIn
>>>  GitHub
>>>
>>>
>>> Data Umbrella
>>> NYC PyLadies
>>>
>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From samirkmahajan1972 at gmail.com Wed Aug 11 15:16:34 2021
From: samirkmahajan1972 at gmail.com (Samir K Mahajan)
Date: Thu, 12 Aug 2021 00:46:34 +0530
Subject: [scikitlearn] Regarding negative value of sklearn.metrics.r2_score
and sklearn.metrics.explained_variance_score
MessageID:
Dear All,
I am amazed to find negative values of sklearn.metrics.r2_score and
sklearn.metrics.explained_variance_score in a model ( cross validation of
OLS regression model)
However, what amuses me more is seeing you justifying negative
'sklearn.metrics.r2_score ' in your documentation. This does not
make sense to me . Please justify to me how squared values are negative.
Regards,
Samir K Mahajan.
 next part 
An HTML attachment was scrubbed...
URL:
From drabas.t at gmail.com Wed Aug 11 15:29:09 2021
From: drabas.t at gmail.com (Tomek Drabas)
Date: Wed, 11 Aug 2021 19:29:09 +0000
Subject: [scikitlearn] Regarding negative value of
sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
InReplyTo:
References:
MessageID:
Hi Samir,
In the documentation there?s a link to how the coefficient of determination is defined: https://en.m.wikipedia.org/wiki/Coefficient_of_determination From this it is easy to see when the values can become negative: when the model performs significantly worse than the baseline (predicting average for each observation).
Common misconception is that the ?squaredness? is of some single value but in here (per CoD?s definition) it?s the ration of the squared distances of the baseline model and the estimated one.
Hope this helps,
Tom
Sent on the go
________________________________
From: scikitlearn on behalf of Samir K Mahajan
Sent: Wednesday, August 11, 2021 12:16:34 PM
To: scikitlearn at python.org
Subject: [scikitlearn] Regarding negative value of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
Dear All,
I am amazed to find negative values of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score in a model ( cross validation of OLS regression model)
However, what amuses me more is seeing you justifying negative 'sklearn.metrics.r2_score ' in your documentation. This does not make sense to me . Please justify to me how squared values are negative.
Regards,
Samir K Mahajan.
 next part 
An HTML attachment was scrubbed...
URL:
From reshama.stat at gmail.com Wed Aug 11 15:35:06 2021
From: reshama.stat at gmail.com (Reshama Shaikh)
Date: Wed, 11 Aug 2021 15:35:06 0400
Subject: [scikitlearn] Regarding negative value of
sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
InReplyTo:
References:
MessageID: <0A284AE81F6C4E6292B969CBD43B9C78@gmail.com>
Hello Samir,
The tone of your email is disrespectful. For any project, but particularly so for an open source project. It is not for this community.
Please review the Code of Conduct for this library.
http://scikitlearn.org/stable/developers/contributing.html
Regards,
Reshama
> On Aug 11, 2021, at 3:18 PM, Samir K Mahajan wrote:
>
> ?
> Dear All,
> I am amazed to find negative values of sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score in a model ( cross validation of OLS regression model)
> However, what amuses me more is seeing you justifying negative 'sklearn.metrics.r2_score ' in your documentation. This does not make sense to me . Please justify to me how squared values are negative.
>
> Regards,
> Samir K Mahajan.
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
 next part 
An HTML attachment was scrubbed...
URL:
From christophe at pallier.org Thu Aug 12 02:31:01 2021
From: christophe at pallier.org (Christophe Pallier)
Date: Thu, 12 Aug 2021 08:31:01 +0200
Subject: [scikitlearn] Regarding negative value of
sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
InReplyTo:
References:
MessageID:
Simple: despite its name R2 is not a square. Look up its definition.
On Wed, 11 Aug 2021, 21:17 Samir K Mahajan,
wrote:
> Dear All,
> I am amazed to find negative values of sklearn.metrics.r2_score and
> sklearn.metrics.explained_variance_score in a model ( cross validation of
> OLS regression model)
> However, what amuses me more is seeing you justifying negative
> 'sklearn.metrics.r2_score ' in your documentation. This does not
> make sense to me . Please justify to me how squared values are negative.
>
> Regards,
> Samir K Mahajan.
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From samirkmahajan1972 at gmail.com Thu Aug 12 15:18:45 2021
From: samirkmahajan1972 at gmail.com (Samir K Mahajan)
Date: Fri, 13 Aug 2021 00:48:45 +0530
Subject: [scikitlearn] Regarding negative value of
sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
InReplyTo:
References:
MessageID:
Dear Christophe Pallier, Reshama Saikh and Tromek Drabas,
Thank you for your kind response. Fair enough. I go with you R2 is not a
square. However, if you open any book of econometrics, it says R2 is a
ratio that lies between 0 and 1. *This is the constraint.* It measures
the proportion or percentage of the total variation in response
variable (Y) explained by the regressors (Xs) in the model . Remaining
proportion of variation in Y, if any, is explained by the residual term(u)
Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
linear scale (5.763335245921777). This negative value breaks the *constraint.
*I just want to highlight that. I think it needs to be corrected. Rest is
up to you .
I find that Reshama Saikh is hurt by my email. I am really sorry for
that. Please note I never undermine your capabilities and initiatives. You
are great people doing great jobs. I realise that I should have been more
sensible.
My regards to all of you.
Samir K Mahajan
On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier
wrote:
> Simple: despite its name R2 is not a square. Look up its definition.
>
> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan,
> wrote:
>
>> Dear All,
>> I am amazed to find negative values of sklearn.metrics.r2_score and
>> sklearn.metrics.explained_variance_score in a model ( cross validation of
>> OLS regression model)
>> However, what amuses me more is seeing you justifying negative
>> 'sklearn.metrics.r2_score ' in your documentation. This does not
>> make sense to me . Please justify to me how squared values are negative.
>>
>> Regards,
>> Samir K Mahajan.
>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From maykonschots at gmail.com Thu Aug 12 15:30:34 2021
From: maykonschots at gmail.com (mrschots)
Date: Thu, 12 Aug 2021 16:30:34 0300
Subject: [scikitlearn] Regarding negative value of
sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
InReplyTo:
References:
MessageID:
There is no constraint, that?s the point since nothing limits you to have a
model with crap predictions leading to be worse than to just predict the
target?s mean for every data point.
If you do so ?> negative R2.
Best Regards,
Em qui., 12 de ago. de 2021 ?s 16:21, Samir K Mahajan <
samirkmahajan1972 at gmail.com> escreveu:
>
> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas,
> Thank you for your kind response. Fair enough. I go with you R2 is not a
> square. However, if you open any book of econometrics, it says R2 is a
> ratio that lies between 0 and 1. *This is the constraint.* It measures
> the proportion or percentage of the total variation in response
> variable (Y) explained by the regressors (Xs) in the model . Remaining
> proportion of variation in Y, if any, is explained by the residual term(u)
> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
> linear scale (5.763335245921777). This negative value breaks the *constraint.
> *I just want to highlight that. I think it needs to be corrected. Rest is
> up to you .
>
> I find that Reshama Saikh is hurt by my email. I am really sorry for
> that. Please note I never undermine your capabilities and initiatives. You
> are great people doing great jobs. I realise that I should have been more
> sensible.
>
> My regards to all of you.
>
> Samir K Mahajan
>
>
>
>
>
>
>
>
> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
> christophe at pallier.org> wrote:
>
>> Simple: despite its name R2 is not a square. Look up its definition.
>>
>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan,
>> wrote:
>>
>>> Dear All,
>>> I am amazed to find negative values of sklearn.metrics.r2_score and
>>> sklearn.metrics.explained_variance_score in a model ( cross validation of
>>> OLS regression model)
>>> However, what amuses me more is seeing you justifying negative
>>> 'sklearn.metrics.r2_score ' in your documentation. This does not
>>> make sense to me . Please justify to me how squared values are negative.
>>>
>>> Regards,
>>> Samir K Mahajan.
>>>
>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>

Schots
 next part 
An HTML attachment was scrubbed...
URL:
From drabas.t at gmail.com Thu Aug 12 15:41:02 2021
From: drabas.t at gmail.com (Tomek Drabas)
Date: Thu, 12 Aug 2021 12:41:02 0700
Subject: [scikitlearn] Regarding negative value of
sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
InReplyTo:
References:
MessageID:
In the simplest case of a simple linear regression what you wrote holds
true: the explained variance is simply a sum of variance explained by the
model and the residual variability that cannot be explained, and that would
always lie between 0 and 1. e.g. here:
https://online.stat.psu.edu/stat500/lesson/9/9.3
However, this would be quite hard to do for more complex models (even for a
multivariate linear regression) thus a need for a more general definition
like here: https://en.wikipedia.org/wiki/Coefficient_of_determination or
here https://www.investopedia.com/terms/r/rsquared.asp. I can easily
envision a situation where data has outliers (i.e. data is not clean
enough to be used in modeling) that it'd render a model that performs worse
than a base model of simply taking average as a prediction for each
observation.
Cheers,
Tom
On Thu, Aug 12, 2021 at 12:19 PM Samir K Mahajan <
samirkmahajan1972 at gmail.com> wrote:
>
> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas,
> Thank you for your kind response. Fair enough. I go with you R2 is not a
> square. However, if you open any book of econometrics, it says R2 is a
> ratio that lies between 0 and 1. *This is the constraint.* It measures
> the proportion or percentage of the total variation in response
> variable (Y) explained by the regressors (Xs) in the model . Remaining
> proportion of variation in Y, if any, is explained by the residual term(u)
> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
> linear scale (5.763335245921777). This negative value breaks the *constraint.
> *I just want to highlight that. I think it needs to be corrected. Rest is
> up to you .
>
> I find that Reshama Saikh is hurt by my email. I am really sorry for
> that. Please note I never undermine your capabilities and initiatives. You
> are great people doing great jobs. I realise that I should have been more
> sensible.
>
> My regards to all of you.
>
> Samir K Mahajan
>
>
>
>
>
>
>
>
> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
> christophe at pallier.org> wrote:
>
>> Simple: despite its name R2 is not a square. Look up its definition.
>>
>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan,
>> wrote:
>>
>>> Dear All,
>>> I am amazed to find negative values of sklearn.metrics.r2_score and
>>> sklearn.metrics.explained_variance_score in a model ( cross validation of
>>> OLS regression model)
>>> However, what amuses me more is seeing you justifying negative
>>> 'sklearn.metrics.r2_score ' in your documentation. This does not
>>> make sense to me . Please justify to me how squared values are negative.
>>>
>>> Regards,
>>> Samir K Mahajan.
>>>
>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From mail at sebastianraschka.com Thu Aug 12 15:28:03 2021
From: mail at sebastianraschka.com (Sebastian Raschka)
Date: Thu, 12 Aug 2021 14:28:03 0500
Subject: [scikitlearn] Regarding negative value of
sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
InReplyTo:
References:
MessageID: <7d546c0043ef430cb8e0b046eb4748d6@Spark>
The R2 function in scikitlearn works fine. A negative means that the regression model fits the data worse than a horizontal line representing the sample mean. E.g. you usually get that if you are overfitting the training set a lot and then apply that model to the test set. The econometrics book probably didn't cover applying a model to an independent data or test set, hence the [0, 1] suggestion.
Cheers,
Sebastian
On Aug 12, 2021, 2:20 PM 0500, Samir K Mahajan , wrote:
>
> Dear?Christophe Pallier,? Reshama?Saikh and Tromek?Drabas,
>
> Thank you for your kind response.??Fair enough. I go with?you R2 is not a square.? However, if you?open any? book of econometrics,? it says R2 is? a ratio that lies between 0? and 1.? This is the constraint. It measures the proportion or percentage of the total variation in? response variable?(Y)? explained by the regressors (Xs) in the model . Remaining proportion?of variation?in Y, if any,? is explained by the residual term(u) Now, sklearn.matrics.?metrics.r2_score gives me a negative value lying on a linear scale (5.763335245921777). This negative value breaks the constraint. I just want to highlight that. I think it needs to be corrected. Rest is up to you .
>
> I find that? Reshama?Saikh? is hurt by my email. I am really sorry for that. Please note I never undermine your? capabilities?and initiatives. You are great?people doing great jobs. I realise that I should have been more sensible.
>
> My regards to all of you.
>
> Samir K Mahajan
>
>
>
>
>
>
>
>
> > On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier wrote:
> > > Simple: despite its name R2 is not a square. Look up its definition.
> > >
> > > > On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, wrote:
> > > > > Dear All,
> > > > > I am amazed to find? negative? values of? sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score in a model ( cross validation of OLS regression model)
> > > > > However, what?amuses me more? is seeing you justifying? ?negative? 'sklearn.metrics.r2_score ' in your documentation.? This does not make?sense to?me . Please justify to me how squared?values are negative.
> > > > >
> > > > > Regards,
> > > > > Samir K Mahajan.
> > > > >
> > > > > _______________________________________________
> > > > > scikitlearn mailing list
> > > > > scikitlearn at python.org
> > > > > https://mail.python.org/mailman/listinfo/scikitlearn
> > > _______________________________________________
> > > scikitlearn mailing list
> > > scikitlearn at python.org
> > > https://mail.python.org/mailman/listinfo/scikitlearn
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
 next part 
An HTML attachment was scrubbed...
URL:
From samirkmahajan1972 at gmail.com Thu Aug 12 16:11:17 2021
From: samirkmahajan1972 at gmail.com (Samir K Mahajan)
Date: Fri, 13 Aug 2021 01:41:17 +0530
Subject: [scikitlearn] Regarding negative value of
sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
InReplyTo: <7d546c0043ef430cb8e0b046eb4748d6@Spark>
References:
<7d546c0043ef430cb8e0b046eb4748d6@Spark>
MessageID:
Thanks to all of you for your kind response. Indeed, it is a
great learning experience. Yes, econometrics books too create models for
prediction, and programming really makes things better in a complex
world. My understanding is that machine learning does depend on
econometrics too.
My Regards,
Samir K Mahajan
On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka
wrote:
> The R2 function in scikitlearn works fine. A negative means that the
> regression model fits the data worse than a horizontal line representing
> the sample mean. E.g. you usually get that if you are overfitting the
> training set a lot and then apply that model to the test set. The
> econometrics book probably didn't cover applying a model to an independent
> data or test set, hence the [0, 1] suggestion.
>
> Cheers,
> Sebastian
>
>
> On Aug 12, 2021, 2:20 PM 0500, Samir K Mahajan <
> samirkmahajan1972 at gmail.com>, wrote:
>
>
> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas,
> Thank you for your kind response. Fair enough. I go with you R2 is not a
> square. However, if you open any book of econometrics, it says R2 is a
> ratio that lies between 0 and 1. *This is the constraint.* It measures
> the proportion or percentage of the total variation in response
> variable (Y) explained by the regressors (Xs) in the model . Remaining
> proportion of variation in Y, if any, is explained by the residual term(u)
> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
> linear scale (5.763335245921777). This negative value breaks the
> *constraint.* I just want to highlight that. I think it needs to be
> corrected. Rest is up to you .
>
> I find that Reshama Saikh is hurt by my email. I am really sorry for
> that. Please note I never undermine your capabilities and initiatives. You
> are great people doing great jobs. I realise that I should have been more
> sensible.
>
> My regards to all of you.
>
> Samir K Mahajan
>
>
>
>
>
>
>
>
> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
> christophe at pallier.org> wrote:
>
>> Simple: despite its name R2 is not a square. Look up its definition.
>>
>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan,
>> wrote:
>>
>>> Dear All,
>>> I am amazed to find negative values of sklearn.metrics.r2_score and
>>> sklearn.metrics.explained_variance_score in a model ( cross validation of
>>> OLS regression model)
>>> However, what amuses me more is seeing you justifying negative
>>> 'sklearn.metrics.r2_score ' in your documentation. This does not
>>> make sense to me . Please justify to me how squared values are negative.
>>>
>>> Regards,
>>> Samir K Mahajan.
>>>
>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From samirkmahajan1972 at gmail.com Thu Aug 12 16:32:03 2021
From: samirkmahajan1972 at gmail.com (Samir K Mahajan)
Date: Fri, 13 Aug 2021 02:02:03 +0530
Subject: [scikitlearn] Regarding negative value of
sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
InReplyTo:
References:
<7d546c0043ef430cb8e0b046eb4748d6@Spark>
MessageID:
A note please (to Sebastian Raschka, mrschots).
The OLS model that I used ( where the test score gave me a negative
value) was not a good fit. Initial findings showed that t*he
regression coefficients and the model as a whole were significant, *yet
, finally , it failed in two econometrics tests such as VIF (used for
detecting multicollinearity ) and DurbinWatson test ( used for detecting
autocorrelation). *Presence of multicollinearity and autocorrelation
problems * in the model make it unsuitable for prediction.
Regards,
Samir K Mahajan.
On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan
wrote:
> Thanks to all of you for your kind response. Indeed, it is a
> great learning experience. Yes, econometrics books too create models for
> prediction, and programming really makes things better in a complex
> world. My understanding is that machine learning does depend on
> econometrics too.
>
> My Regards,
>
> Samir K Mahajan
>
> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <
> mail at sebastianraschka.com> wrote:
>
>> The R2 function in scikitlearn works fine. A negative means that the
>> regression model fits the data worse than a horizontal line representing
>> the sample mean. E.g. you usually get that if you are overfitting the
>> training set a lot and then apply that model to the test set. The
>> econometrics book probably didn't cover applying a model to an independent
>> data or test set, hence the [0, 1] suggestion.
>>
>> Cheers,
>> Sebastian
>>
>>
>> On Aug 12, 2021, 2:20 PM 0500, Samir K Mahajan <
>> samirkmahajan1972 at gmail.com>, wrote:
>>
>>
>> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas,
>> Thank you for your kind response. Fair enough. I go with you R2 is not
>> a square. However, if you open any book of econometrics, it says R2 is
>> a ratio that lies between 0 and 1. *This is the constraint.* It
>> measures the proportion or percentage of the total variation in response
>> variable (Y) explained by the regressors (Xs) in the model . Remaining
>> proportion of variation in Y, if any, is explained by the residual term(u)
>> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
>> linear scale (5.763335245921777). This negative value breaks the
>> *constraint.* I just want to highlight that. I think it needs to be
>> corrected. Rest is up to you .
>>
>> I find that Reshama Saikh is hurt by my email. I am really sorry for
>> that. Please note I never undermine your capabilities and initiatives. You
>> are great people doing great jobs. I realise that I should have been more
>> sensible.
>>
>> My regards to all of you.
>>
>> Samir K Mahajan
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
>> christophe at pallier.org> wrote:
>>
>>> Simple: despite its name R2 is not a square. Look up its definition.
>>>
>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan,
>>> wrote:
>>>
>>>> Dear All,
>>>> I am amazed to find negative values of sklearn.metrics.r2_score and
>>>> sklearn.metrics.explained_variance_score in a model ( cross validation of
>>>> OLS regression model)
>>>> However, what amuses me more is seeing you justifying negative
>>>> 'sklearn.metrics.r2_score ' in your documentation. This does not
>>>> make sense to me . Please justify to me how squared values are negative.
>>>>
>>>> Regards,
>>>> Samir K Mahajan.
>>>>
>>>> _______________________________________________
>>>> scikitlearn mailing list
>>>> scikitlearn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>
>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
>
 next part 
An HTML attachment was scrubbed...
URL:
From christophe at pallier.org Fri Aug 13 03:36:06 2021
From: christophe at pallier.org (Christophe Pallier)
Date: Fri, 13 Aug 2021 09:36:06 +0200
Subject: [scikitlearn] Regarding negative value of
sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
InReplyTo:
References:
<7d546c0043ef430cb8e0b046eb4748d6@Spark>
MessageID:
Actually, multicollinearity and autocorrelation are problems for
*inference* more than for *prediction*. For example, if there is
autocorrelation, the residuals are not independent, and the degrees of
freedom are wrong for the tests in an OLS model (but you can use, e.g., an
AR1 model).
On Thu, 12 Aug 2021, 22:32 Samir K Mahajan,
wrote:
> A note please (to Sebastian Raschka, mrschots).
>
>
> The OLS model that I used ( where the test score gave me a negative
> value) was not a good fit. Initial findings showed that t*he
> regression coefficients and the model as a whole were significant, *yet
> , finally , it failed in two econometrics tests such as VIF (used for
> detecting multicollinearity ) and DurbinWatson test ( used for detecting
> autocorrelation). *Presence of multicollinearity and autocorrelation
> problems * in the model make it unsuitable for prediction.
> Regards,
>
> Samir K Mahajan.
>
> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan <
> samirkmahajan1972 at gmail.com> wrote:
>
>> Thanks to all of you for your kind response. Indeed, it is a
>> great learning experience. Yes, econometrics books too create models for
>> prediction, and programming really makes things better in a complex
>> world. My understanding is that machine learning does depend on
>> econometrics too.
>>
>> My Regards,
>>
>> Samir K Mahajan
>>
>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <
>> mail at sebastianraschka.com> wrote:
>>
>>> The R2 function in scikitlearn works fine. A negative means that the
>>> regression model fits the data worse than a horizontal line representing
>>> the sample mean. E.g. you usually get that if you are overfitting the
>>> training set a lot and then apply that model to the test set. The
>>> econometrics book probably didn't cover applying a model to an independent
>>> data or test set, hence the [0, 1] suggestion.
>>>
>>> Cheers,
>>> Sebastian
>>>
>>>
>>> On Aug 12, 2021, 2:20 PM 0500, Samir K Mahajan <
>>> samirkmahajan1972 at gmail.com>, wrote:
>>>
>>>
>>> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas,
>>> Thank you for your kind response. Fair enough. I go with you R2 is not
>>> a square. However, if you open any book of econometrics, it says R2 is
>>> a ratio that lies between 0 and 1. *This is the constraint.* It
>>> measures the proportion or percentage of the total variation in response
>>> variable (Y) explained by the regressors (Xs) in the model . Remaining
>>> proportion of variation in Y, if any, is explained by the residual term(u)
>>> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
>>> linear scale (5.763335245921777). This negative value breaks the
>>> *constraint.* I just want to highlight that. I think it needs to be
>>> corrected. Rest is up to you .
>>>
>>> I find that Reshama Saikh is hurt by my email. I am really sorry for
>>> that. Please note I never undermine your capabilities and initiatives. You
>>> are great people doing great jobs. I realise that I should have been more
>>> sensible.
>>>
>>> My regards to all of you.
>>>
>>> Samir K Mahajan
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
>>> christophe at pallier.org> wrote:
>>>
>>>> Simple: despite its name R2 is not a square. Look up its definition.
>>>>
>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <
>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>
>>>>> Dear All,
>>>>> I am amazed to find negative values of sklearn.metrics.r2_score and
>>>>> sklearn.metrics.explained_variance_score in a model ( cross validation of
>>>>> OLS regression model)
>>>>> However, what amuses me more is seeing you justifying negative
>>>>> 'sklearn.metrics.r2_score ' in your documentation. This does not
>>>>> make sense to me . Please justify to me how squared values are negative.
>>>>>
>>>>> Regards,
>>>>> Samir K Mahajan.
>>>>>
>>>>> _______________________________________________
>>>>> scikitlearn mailing list
>>>>> scikitlearn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>
>>>> _______________________________________________
>>>> scikitlearn mailing list
>>>> scikitlearn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>
>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>
>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>
>> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From samirkmahajan1972 at gmail.com Fri Aug 13 06:02:55 2021
From: samirkmahajan1972 at gmail.com (Samir K Mahajan)
Date: Fri, 13 Aug 2021 15:32:55 +0530
Subject: [scikitlearn] Regarding negative value of
sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
InReplyTo:
References:
<7d546c0043ef430cb8e0b046eb4748d6@Spark>
MessageID:
Dear Christophe Pallier*,*
When we are doing prediction, we are relying on the values of the
coefficients of the model created. We are feeding test data on the model
for prediction. We may be nterested to see if the OLS
estimators(coefficients) are BLUE or not. In the presence of
autocorrelation (normally noticed in time series data), residuals are not
independent, and as such the OLS estimators are not BLUE in the sense that
they don't have minimum variance, and thus no more efficient estimators.
Statistical tests (t, F and *?*2) may not be valid. We may reject the
model to make predictions in such a situation. . We have to rely upon
other improved models. There may be issues relating to multicollinearity
(in case of multivariable regression model) and heteroscedasticity (mostly
seen in crosssection data) too in a model. Can we discard these tools
while predicting a model?
Regards,
Samir K Mahajan
On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier
wrote:
> Actually, multicollinearity and autocorrelation are problems for
> *inference* more than for *prediction*. For example, if there is
> autocorrelation, the residuals are not independent, and the degrees of
> freedom are wrong for the tests in an OLS model (but you can use, e.g., an
> AR1 model).
>
> On Thu, 12 Aug 2021, 22:32 Samir K Mahajan,
> wrote:
>
>> A note please (to Sebastian Raschka, mrschots).
>>
>>
>> The OLS model that I used ( where the test score gave me a negative
>> value) was not a good fit. Initial findings showed that t*he
>> regression coefficients and the model as a whole were significant, *yet
>> , finally , it failed in two econometrics tests such as VIF (used for
>> detecting multicollinearity ) and DurbinWatson test ( used for detecting
>> autocorrelation). *Presence of multicollinearity and autocorrelation
>> problems * in the model make it unsuitable for prediction.
>> Regards,
>>
>> Samir K Mahajan.
>>
>> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan <
>> samirkmahajan1972 at gmail.com> wrote:
>>
>>> Thanks to all of you for your kind response. Indeed, it is a
>>> great learning experience. Yes, econometrics books too create models for
>>> prediction, and programming really makes things better in a complex
>>> world. My understanding is that machine learning does depend on
>>> econometrics too.
>>>
>>> My Regards,
>>>
>>> Samir K Mahajan
>>>
>>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <
>>> mail at sebastianraschka.com> wrote:
>>>
>>>> The R2 function in scikitlearn works fine. A negative means that the
>>>> regression model fits the data worse than a horizontal line representing
>>>> the sample mean. E.g. you usually get that if you are overfitting the
>>>> training set a lot and then apply that model to the test set. The
>>>> econometrics book probably didn't cover applying a model to an independent
>>>> data or test set, hence the [0, 1] suggestion.
>>>>
>>>> Cheers,
>>>> Sebastian
>>>>
>>>>
>>>> On Aug 12, 2021, 2:20 PM 0500, Samir K Mahajan <
>>>> samirkmahajan1972 at gmail.com>, wrote:
>>>>
>>>>
>>>> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas,
>>>> Thank you for your kind response. Fair enough. I go with you R2 is
>>>> not a square. However, if you open any book of econometrics, it says R2
>>>> is a ratio that lies between 0 and 1. *This is the constraint.* It
>>>> measures the proportion or percentage of the total variation in response
>>>> variable (Y) explained by the regressors (Xs) in the model . Remaining
>>>> proportion of variation in Y, if any, is explained by the residual term(u)
>>>> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
>>>> linear scale (5.763335245921777). This negative value breaks the
>>>> *constraint.* I just want to highlight that. I think it needs to be
>>>> corrected. Rest is up to you .
>>>>
>>>> I find that Reshama Saikh is hurt by my email. I am really sorry for
>>>> that. Please note I never undermine your capabilities and initiatives. You
>>>> are great people doing great jobs. I realise that I should have been more
>>>> sensible.
>>>>
>>>> My regards to all of you.
>>>>
>>>> Samir K Mahajan
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
>>>> christophe at pallier.org> wrote:
>>>>
>>>>> Simple: despite its name R2 is not a square. Look up its definition.
>>>>>
>>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <
>>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>>
>>>>>> Dear All,
>>>>>> I am amazed to find negative values of sklearn.metrics.r2_score
>>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation
>>>>>> of OLS regression model)
>>>>>> However, what amuses me more is seeing you justifying negative
>>>>>> 'sklearn.metrics.r2_score ' in your documentation. This does not
>>>>>> make sense to me . Please justify to me how squared values are negative.
>>>>>>
>>>>>> Regards,
>>>>>> Samir K Mahajan.
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikitlearn mailing list
>>>>>> scikitlearn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>>
>>>>> _______________________________________________
>>>>> scikitlearn mailing list
>>>>> scikitlearn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>
>>>> _______________________________________________
>>>> scikitlearn mailing list
>>>> scikitlearn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>
>>>> _______________________________________________
>>>> scikitlearn mailing list
>>>> scikitlearn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>
>>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From christophe at pallier.org Fri Aug 13 06:08:29 2021
From: christophe at pallier.org (Christophe Pallier)
Date: Fri, 13 Aug 2021 12:08:29 +0200
Subject: [scikitlearn] Regarding negative value of
sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
InReplyTo:
References:
<7d546c0043ef430cb8e0b046eb4748d6@Spark>
MessageID:
Indeed , this is basically what I told you (you do not be need to copy
textbook stuff: I taught probas/stats) : these are mostly problems for
*inference*.
On Fri, 13 Aug 2021, 12:03 Samir K Mahajan,
wrote:
>
> Dear Christophe Pallier*,*
>
> When we are doing prediction, we are relying on the values of the
> coefficients of the model created. We are feeding test data on the model
> for prediction. We may be nterested to see if the OLS
> estimators(coefficients) are BLUE or not. In the presence of
> autocorrelation (normally noticed in time series data), residuals are not
> independent, and as such the OLS estimators are not BLUE in the sense that
> they don't have minimum variance, and thus no more efficient estimators.
> Statistical tests (t, F and *?*2) may not be valid. We may reject the
> model to make predictions in such a situation. . We have to rely upon
> other improved models. There may be issues relating to multicollinearity
> (in case of multivariable regression model) and heteroscedasticity (mostly
> seen in crosssection data) too in a model. Can we discard these tools
> while predicting a model?
>
> Regards,
>
> Samir K Mahajan
>
>
> On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier
> wrote:
>
>> Actually, multicollinearity and autocorrelation are problems for
>> *inference* more than for *prediction*. For example, if there is
>> autocorrelation, the residuals are not independent, and the degrees of
>> freedom are wrong for the tests in an OLS model (but you can use, e.g., an
>> AR1 model).
>>
>> On Thu, 12 Aug 2021, 22:32 Samir K Mahajan,
>> wrote:
>>
>>> A note please (to Sebastian Raschka, mrschots).
>>>
>>>
>>> The OLS model that I used ( where the test score gave me a negative
>>> value) was not a good fit. Initial findings showed that t*he
>>> regression coefficients and the model as a whole were significant, *yet
>>> , finally , it failed in two econometrics tests such as VIF (used for
>>> detecting multicollinearity ) and DurbinWatson test ( used for detecting
>>> autocorrelation). *Presence of multicollinearity and autocorrelation
>>> problems * in the model make it unsuitable for prediction.
>>> Regards,
>>>
>>> Samir K Mahajan.
>>>
>>> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan <
>>> samirkmahajan1972 at gmail.com> wrote:
>>>
>>>> Thanks to all of you for your kind response. Indeed, it is a
>>>> great learning experience. Yes, econometrics books too create models for
>>>> prediction, and programming really makes things better in a complex
>>>> world. My understanding is that machine learning does depend on
>>>> econometrics too.
>>>>
>>>> My Regards,
>>>>
>>>> Samir K Mahajan
>>>>
>>>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <
>>>> mail at sebastianraschka.com> wrote:
>>>>
>>>>> The R2 function in scikitlearn works fine. A negative means that the
>>>>> regression model fits the data worse than a horizontal line representing
>>>>> the sample mean. E.g. you usually get that if you are overfitting the
>>>>> training set a lot and then apply that model to the test set. The
>>>>> econometrics book probably didn't cover applying a model to an independent
>>>>> data or test set, hence the [0, 1] suggestion.
>>>>>
>>>>> Cheers,
>>>>> Sebastian
>>>>>
>>>>>
>>>>> On Aug 12, 2021, 2:20 PM 0500, Samir K Mahajan <
>>>>> samirkmahajan1972 at gmail.com>, wrote:
>>>>>
>>>>>
>>>>> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas,
>>>>> Thank you for your kind response. Fair enough. I go with you R2 is
>>>>> not a square. However, if you open any book of econometrics, it says R2
>>>>> is a ratio that lies between 0 and 1. *This is the constraint.* It
>>>>> measures the proportion or percentage of the total variation in response
>>>>> variable (Y) explained by the regressors (Xs) in the model . Remaining
>>>>> proportion of variation in Y, if any, is explained by the residual term(u)
>>>>> Now, sklearn.matrics. metrics.r2_score gives me a negative value lying on a
>>>>> linear scale (5.763335245921777). This negative value breaks the
>>>>> *constraint.* I just want to highlight that. I think it needs to be
>>>>> corrected. Rest is up to you .
>>>>>
>>>>> I find that Reshama Saikh is hurt by my email. I am really sorry for
>>>>> that. Please note I never undermine your capabilities and initiatives. You
>>>>> are great people doing great jobs. I realise that I should have been more
>>>>> sensible.
>>>>>
>>>>> My regards to all of you.
>>>>>
>>>>> Samir K Mahajan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
>>>>> christophe at pallier.org> wrote:
>>>>>
>>>>>> Simple: despite its name R2 is not a square. Look up its definition.
>>>>>>
>>>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <
>>>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>>>
>>>>>>> Dear All,
>>>>>>> I am amazed to find negative values of sklearn.metrics.r2_score
>>>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation
>>>>>>> of OLS regression model)
>>>>>>> However, what amuses me more is seeing you justifying negative
>>>>>>> 'sklearn.metrics.r2_score ' in your documentation. This does not
>>>>>>> make sense to me . Please justify to me how squared values are negative.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Samir K Mahajan.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikitlearn mailing list
>>>>>>> scikitlearn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>>>
>>>>>> _______________________________________________
>>>>>> scikitlearn mailing list
>>>>>> scikitlearn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>>
>>>>> _______________________________________________
>>>>> scikitlearn mailing list
>>>>> scikitlearn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>
>>>>> _______________________________________________
>>>>> scikitlearn mailing list
>>>>> scikitlearn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>
>>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From danshiebler at gmail.com Fri Aug 13 16:24:38 2021
From: danshiebler at gmail.com (Dan Shiebler)
Date: Fri, 13 Aug 2021 16:24:38 0400
Subject: [scikitlearn] Regarding negative value of
sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
InReplyTo:
References:
<7d546c0043ef430cb8e0b046eb4748d6@Spark>
MessageID:
Hey Samir, this blog post has some more details on the difference between
the square of the correlation coefficient and the coefficient of
determination: danshiebler.com/20170625metrics/
On Fri, Aug 13, 2021 at 6:10 AM Christophe Pallier
wrote:
> Indeed , this is basically what I told you (you do not be need to copy
> textbook stuff: I taught probas/stats) : these are mostly problems for
> *inference*.
>
> On Fri, 13 Aug 2021, 12:03 Samir K Mahajan,
> wrote:
>
>>
>> Dear Christophe Pallier*,*
>>
>> When we are doing prediction, we are relying on the values of the
>> coefficients of the model created. We are feeding test data on the model
>> for prediction. We may be nterested to see if the OLS
>> estimators(coefficients) are BLUE or not. In the presence of
>> autocorrelation (normally noticed in time series data), residuals are not
>> independent, and as such the OLS estimators are not BLUE in the sense that
>> they don't have minimum variance, and thus no more efficient estimators.
>> Statistical tests (t, F and *?*2) may not be valid. We may reject the
>> model to make predictions in such a situation. . We have to rely upon
>> other improved models. There may be issues relating to multicollinearity
>> (in case of multivariable regression model) and heteroscedasticity (mostly
>> seen in crosssection data) too in a model. Can we discard these tools
>> while predicting a model?
>>
>> Regards,
>>
>> Samir K Mahajan
>>
>>
>> On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier <
>> christophe at pallier.org> wrote:
>>
>>> Actually, multicollinearity and autocorrelation are problems for
>>> *inference* more than for *prediction*. For example, if there is
>>> autocorrelation, the residuals are not independent, and the degrees of
>>> freedom are wrong for the tests in an OLS model (but you can use, e.g., an
>>> AR1 model).
>>>
>>> On Thu, 12 Aug 2021, 22:32 Samir K Mahajan,
>>> wrote:
>>>
>>>> A note please (to Sebastian Raschka, mrschots).
>>>>
>>>>
>>>> The OLS model that I used ( where the test score gave me a negative
>>>> value) was not a good fit. Initial findings showed that t*he
>>>> regression coefficients and the model as a whole were significant, *yet
>>>> , finally , it failed in two econometrics tests such as VIF (used for
>>>> detecting multicollinearity ) and DurbinWatson test ( used for detecting
>>>> autocorrelation). *Presence of multicollinearity and autocorrelation
>>>> problems * in the model make it unsuitable for prediction.
>>>> Regards,
>>>>
>>>> Samir K Mahajan.
>>>>
>>>> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan <
>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>
>>>>> Thanks to all of you for your kind response. Indeed, it is a
>>>>> great learning experience. Yes, econometrics books too create models for
>>>>> prediction, and programming really makes things better in a complex
>>>>> world. My understanding is that machine learning does depend on
>>>>> econometrics too.
>>>>>
>>>>> My Regards,
>>>>>
>>>>> Samir K Mahajan
>>>>>
>>>>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <
>>>>> mail at sebastianraschka.com> wrote:
>>>>>
>>>>>> The R2 function in scikitlearn works fine. A negative means that the
>>>>>> regression model fits the data worse than a horizontal line representing
>>>>>> the sample mean. E.g. you usually get that if you are overfitting the
>>>>>> training set a lot and then apply that model to the test set. The
>>>>>> econometrics book probably didn't cover applying a model to an independent
>>>>>> data or test set, hence the [0, 1] suggestion.
>>>>>>
>>>>>> Cheers,
>>>>>> Sebastian
>>>>>>
>>>>>>
>>>>>> On Aug 12, 2021, 2:20 PM 0500, Samir K Mahajan <
>>>>>> samirkmahajan1972 at gmail.com>, wrote:
>>>>>>
>>>>>>
>>>>>> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas,
>>>>>> Thank you for your kind response. Fair enough. I go with you R2 is
>>>>>> not a square. However, if you open any book of econometrics, it says R2
>>>>>> is a ratio that lies between 0 and 1. *This is the constraint.*
>>>>>> It measures the proportion or percentage of the total variation in
>>>>>> response variable (Y) explained by the regressors (Xs) in the model .
>>>>>> Remaining proportion of variation in Y, if any, is explained by the
>>>>>> residual term(u) Now, sklearn.matrics. metrics.r2_score gives me a negative
>>>>>> value lying on a linear scale (5.763335245921777). This negative
>>>>>> value breaks the *constraint.* I just want to highlight that. I
>>>>>> think it needs to be corrected. Rest is up to you .
>>>>>>
>>>>>> I find that Reshama Saikh is hurt by my email. I am really sorry
>>>>>> for that. Please note I never undermine your capabilities and initiatives.
>>>>>> You are great people doing great jobs. I realise that I should have been
>>>>>> more sensible.
>>>>>>
>>>>>> My regards to all of you.
>>>>>>
>>>>>> Samir K Mahajan
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
>>>>>> christophe at pallier.org> wrote:
>>>>>>
>>>>>>> Simple: despite its name R2 is not a square. Look up its definition.
>>>>>>>
>>>>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <
>>>>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Dear All,
>>>>>>>> I am amazed to find negative values of sklearn.metrics.r2_score
>>>>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation
>>>>>>>> of OLS regression model)
>>>>>>>> However, what amuses me more is seeing you justifying negative
>>>>>>>> 'sklearn.metrics.r2_score ' in your documentation. This does not
>>>>>>>> make sense to me . Please justify to me how squared values are negative.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Samir K Mahajan.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> scikitlearn mailing list
>>>>>>>> scikitlearn at python.org
>>>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikitlearn mailing list
>>>>>>> scikitlearn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>>>
>>>>>> _______________________________________________
>>>>>> scikitlearn mailing list
>>>>>> scikitlearn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikitlearn mailing list
>>>>>> scikitlearn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>>
>>>>> _______________________________________________
>>>> scikitlearn mailing list
>>>> scikitlearn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>
>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>

danshiebler.com
(973)  518  0886
 next part 
An HTML attachment was scrubbed...
URL:
From samirkmahajan1972 at gmail.com Sat Aug 14 02:17:01 2021
From: samirkmahajan1972 at gmail.com (Samir K Mahajan)
Date: Sat, 14 Aug 2021 11:47:01 +0530
Subject: [scikitlearn] Regarding negative value of
sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
InReplyTo:
References:
<7d546c0043ef430cb8e0b046eb4748d6@Spark>
MessageID:
Dear Chrisophe,
I think you are oversimplifying by saying econometrics tools are for
inference. Forecasting and prediction are integral parts of econometric
analysis. Econometricians forecast by inferring the right conclusion
about the model . I wish to convey to you that I teach both
statistics and econometrics, and am now learning ML. There is a
fundamental difference among statistics, econometrics and machine
learning.
Regards,
Samir K Mahajan
On Fri, Aug 13, 2021 at 3:39 PM Christophe Pallier
wrote:
> Indeed , this is basically what I told you (you do not be need to copy
> textbook stuff: I taught probas/stats) : these are mostly problems for
> *inference*.
>
> On Fri, 13 Aug 2021, 12:03 Samir K Mahajan,
> wrote:
>
>>
>> Dear Christophe Pallier*,*
>>
>> When we are doing prediction, we are relying on the values of the
>> coefficients of the model created. We are feeding test data on the model
>> for prediction. We may be nterested to see if the OLS
>> estimators(coefficients) are BLUE or not. In the presence of
>> autocorrelation (normally noticed in time series data), residuals are not
>> independent, and as such the OLS estimators are not BLUE in the sense that
>> they don't have minimum variance, and thus no more efficient estimators.
>> Statistical tests (t, F and *?*2) may not be valid. We may reject the
>> model to make predictions in such a situation. . We have to rely upon
>> other improved models. There may be issues relating to multicollinearity
>> (in case of multivariable regression model) and heteroscedasticity (mostly
>> seen in crosssection data) too in a model. Can we discard these tools
>> while predicting a model?
>>
>> Regards,
>>
>> Samir K Mahajan
>>
>>
>> On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier <
>> christophe at pallier.org> wrote:
>>
>>> Actually, multicollinearity and autocorrelation are problems for
>>> *inference* more than for *prediction*. For example, if there is
>>> autocorrelation, the residuals are not independent, and the degrees of
>>> freedom are wrong for the tests in an OLS model (but you can use, e.g., an
>>> AR1 model).
>>>
>>> On Thu, 12 Aug 2021, 22:32 Samir K Mahajan,
>>> wrote:
>>>
>>>> A note please (to Sebastian Raschka, mrschots).
>>>>
>>>>
>>>> The OLS model that I used ( where the test score gave me a negative
>>>> value) was not a good fit. Initial findings showed that t*he
>>>> regression coefficients and the model as a whole were significant, *yet
>>>> , finally , it failed in two econometrics tests such as VIF (used for
>>>> detecting multicollinearity ) and DurbinWatson test ( used for detecting
>>>> autocorrelation). *Presence of multicollinearity and autocorrelation
>>>> problems * in the model make it unsuitable for prediction.
>>>> Regards,
>>>>
>>>> Samir K Mahajan.
>>>>
>>>> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan <
>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>
>>>>> Thanks to all of you for your kind response. Indeed, it is a
>>>>> great learning experience. Yes, econometrics books too create models for
>>>>> prediction, and programming really makes things better in a complex
>>>>> world. My understanding is that machine learning does depend on
>>>>> econometrics too.
>>>>>
>>>>> My Regards,
>>>>>
>>>>> Samir K Mahajan
>>>>>
>>>>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <
>>>>> mail at sebastianraschka.com> wrote:
>>>>>
>>>>>> The R2 function in scikitlearn works fine. A negative means that the
>>>>>> regression model fits the data worse than a horizontal line representing
>>>>>> the sample mean. E.g. you usually get that if you are overfitting the
>>>>>> training set a lot and then apply that model to the test set. The
>>>>>> econometrics book probably didn't cover applying a model to an independent
>>>>>> data or test set, hence the [0, 1] suggestion.
>>>>>>
>>>>>> Cheers,
>>>>>> Sebastian
>>>>>>
>>>>>>
>>>>>> On Aug 12, 2021, 2:20 PM 0500, Samir K Mahajan <
>>>>>> samirkmahajan1972 at gmail.com>, wrote:
>>>>>>
>>>>>>
>>>>>> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas,
>>>>>> Thank you for your kind response. Fair enough. I go with you R2 is
>>>>>> not a square. However, if you open any book of econometrics, it says R2
>>>>>> is a ratio that lies between 0 and 1. *This is the constraint.*
>>>>>> It measures the proportion or percentage of the total variation in
>>>>>> response variable (Y) explained by the regressors (Xs) in the model .
>>>>>> Remaining proportion of variation in Y, if any, is explained by the
>>>>>> residual term(u) Now, sklearn.matrics. metrics.r2_score gives me a negative
>>>>>> value lying on a linear scale (5.763335245921777). This negative
>>>>>> value breaks the *constraint.* I just want to highlight that. I
>>>>>> think it needs to be corrected. Rest is up to you .
>>>>>>
>>>>>> I find that Reshama Saikh is hurt by my email. I am really sorry
>>>>>> for that. Please note I never undermine your capabilities and initiatives.
>>>>>> You are great people doing great jobs. I realise that I should have been
>>>>>> more sensible.
>>>>>>
>>>>>> My regards to all of you.
>>>>>>
>>>>>> Samir K Mahajan
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
>>>>>> christophe at pallier.org> wrote:
>>>>>>
>>>>>>> Simple: despite its name R2 is not a square. Look up its definition.
>>>>>>>
>>>>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <
>>>>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Dear All,
>>>>>>>> I am amazed to find negative values of sklearn.metrics.r2_score
>>>>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation
>>>>>>>> of OLS regression model)
>>>>>>>> However, what amuses me more is seeing you justifying negative
>>>>>>>> 'sklearn.metrics.r2_score ' in your documentation. This does not
>>>>>>>> make sense to me . Please justify to me how squared values are negative.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Samir K Mahajan.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> scikitlearn mailing list
>>>>>>>> scikitlearn at python.org
>>>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikitlearn mailing list
>>>>>>> scikitlearn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>>>
>>>>>> _______________________________________________
>>>>>> scikitlearn mailing list
>>>>>> scikitlearn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikitlearn mailing list
>>>>>> scikitlearn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>>
>>>>> _______________________________________________
>>>> scikitlearn mailing list
>>>> scikitlearn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>
>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From m.caorsi at l2f.ch Sat Aug 14 09:12:18 2021
From: m.caorsi at l2f.ch (Matteo Caorsi)
Date: Sat, 14 Aug 2021 13:12:18 +0000
Subject: [scikitlearn] random forests and multilclass probability
InReplyTo:
References:
<031152d2ca5969eeb04c125fda724105@gmail.com>
<7D53A0FDEB5E4C27966BD6954EEF7398@gmail.com>
MessageID:
Greetings!
I am currently out of office, with limited access to emails, till August the 30th.
Please contact support at giotto.ai for technical issue concerning Giotto Platform.
Otherwise, I will reply to your email as soon as possible upon my return.
With best regards,
Matteo
On 27 Jul 2021, at 12:42, Brown J.B. via scikitlearn wrote:
2021?7?27?(?) 12:03 Guillaume Lema?tre :
As far that I remember, `precision_recall_curve` and `roc_curve` do not support multi class. They are design to work only with binary classification.
Correct, the TPRFPR curve (ROC) was originally intended for tuning a free parameter, in signal detection, and is a binarytype metric.
For ML problems, it lets you tune/determine an estimator's output value threshold (e.g., a probability or a raw discriminant value such as in SVM) for arriving an optimized model that will be used to give a final, binarydiscretized answer in new prediction tasks.
Hope this helps, J.B.
_______________________________________________
scikitlearn mailing list
scikitlearn at python.org
https://mail.python.org/mailman/listinfo/scikitlearn
 next part 
An HTML attachment was scrubbed...
URL:
From m.caorsi at l2f.ch Sat Aug 14 09:12:23 2021
From: m.caorsi at l2f.ch (Matteo Caorsi)
Date: Sat, 14 Aug 2021 13:12:23 +0000
Subject: [scikitlearn] random forests and multilclass probability
InReplyTo:
References:
MessageID:
Greetings!
I am currently out of office, with limited access to emails, till August the 30th.
Please contact support at giotto.ai for technical issue concerning Giotto Platform.
Otherwise, I will reply to your email as soon as possible upon my return.
With best regards,
Matteo
On 27 Jul 2021, at 11:31, Sole Galli via scikitlearn wrote:
Thank you!
I was confused because in the multiclass documentation it says that for those estimators that have multiclass support built in, like Decision trees and Random Forests, then we do not need to use the wrapper classes like the OnevsRest.
Thus I have the following question, if I want to determine the PR curves or the ROC curve, say with microaverage, do I need to wrap them with the 1 vs rest? Or it does not matter? The probability values do change slightly.
Thank you!
??????? Original Message ???????
On Tuesday, July 27th, 2021 at 11:22 AM, Guillaume Lema?tre wrote:
On 27 Jul 2021, at 11:08, Sole Galli via scikitlearn scikitlearn at python.org wrote:
Hello community,
Do I understand correctly that Random Forests are trained as a 1 vs rest when the target has more than 2 classes? Say the target takes values 0, 1 and 2, then the model would train 3 estimators 1 per class under the hood?.
Each decision tree of the forest is natively supporting multi class.
The predict_proba output is an array with 3 columns, containing the probability of each class. If it is 1 vs rest. am I correct to assume that the sum of the probabilities for the 3 classes should not necessarily add up to 1? are they normalized? how is it done so that they do add up to 1?
According to the above answer, the sum for each row of the array given by `predict_proba` will sum to 1.
According to the documentation, the probabilities are computed as:
The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.
Thank you
Sole
scikitlearn mailing list
scikitlearn at python.org
https://mail.python.org/mailman/listinfo/scikitlearn
_______________________________________________
scikitlearn mailing list
scikitlearn at python.org
https://mail.python.org/mailman/listinfo/scikitlearn
 next part 
An HTML attachment was scrubbed...
URL:
From m.caorsi at l2f.ch Sat Aug 14 09:13:25 2021
From: m.caorsi at l2f.ch (Matteo Caorsi)
Date: Sat, 14 Aug 2021 13:13:25 +0000
Subject: [scikitlearn] random forests and multilclass probability
InReplyTo:
References:
<031152d2ca5969eeb04c125fda724105@gmail.com>
<7D53A0FDEB5E4C27966BD6954EEF7398@gmail.com>
MessageID:
Greetings!
I am currently out of office, with limited access to emails, till August the 30th.
Please contact support at giotto.ai for technical issues concerning Giotto Platform.
Otherwise, I will reply to your email as soon as possible upon my return.
With best regards,
Matteo
On 27 Jul 2021, at 12:42, Brown J.B. via scikitlearn wrote:
2021?7?27?(?) 12:03 Guillaume Lema?tre :
As far that I remember, `precision_recall_curve` and `roc_curve` do not support multi class. They are design to work only with binary classification.
Correct, the TPRFPR curve (ROC) was originally intended for tuning a free parameter, in signal detection, and is a binarytype metric.
For ML problems, it lets you tune/determine an estimator's output value threshold (e.g., a probability or a raw discriminant value such as in SVM) for arriving an optimized model that will be used to give a final, binarydiscretized answer in new prediction tasks.
Hope this helps, J.B.
_______________________________________________
scikitlearn mailing list
scikitlearn at python.org
https://mail.python.org/mailman/listinfo/scikitlearn
 next part 
An HTML attachment was scrubbed...
URL:
From m.caorsi at l2f.ch Sat Aug 14 09:13:28 2021
From: m.caorsi at l2f.ch (Matteo Caorsi)
Date: Sat, 14 Aug 2021 13:13:28 +0000
Subject: [scikitlearn] random forests and multilclass probability
InReplyTo:
References:
MessageID: <10E3C9FF928049BEA61741B9D0CFE417@l2f.ch>
Greetings!
I am currently out of office, with limited access to emails, till August the 30th.
Please contact support at giotto.ai for technical issues concerning Giotto Platform.
Otherwise, I will reply to your email as soon as possible upon my return.
With best regards,
Matteo
On 27 Jul 2021, at 11:31, Sole Galli via scikitlearn wrote:
Thank you!
I was confused because in the multiclass documentation it says that for those estimators that have multiclass support built in, like Decision trees and Random Forests, then we do not need to use the wrapper classes like the OnevsRest.
Thus I have the following question, if I want to determine the PR curves or the ROC curve, say with microaverage, do I need to wrap them with the 1 vs rest? Or it does not matter? The probability values do change slightly.
Thank you!
??????? Original Message ???????
On Tuesday, July 27th, 2021 at 11:22 AM, Guillaume Lema?tre wrote:
On 27 Jul 2021, at 11:08, Sole Galli via scikitlearn scikitlearn at python.org wrote:
Hello community,
Do I understand correctly that Random Forests are trained as a 1 vs rest when the target has more than 2 classes? Say the target takes values 0, 1 and 2, then the model would train 3 estimators 1 per class under the hood?.
Each decision tree of the forest is natively supporting multi class.
The predict_proba output is an array with 3 columns, containing the probability of each class. If it is 1 vs rest. am I correct to assume that the sum of the probabilities for the 3 classes should not necessarily add up to 1? are they normalized? how is it done so that they do add up to 1?
According to the above answer, the sum for each row of the array given by `predict_proba` will sum to 1.
According to the documentation, the probabilities are computed as:
The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.
Thank you
Sole
scikitlearn mailing list
scikitlearn at python.org
https://mail.python.org/mailman/listinfo/scikitlearn
_______________________________________________
scikitlearn mailing list
scikitlearn at python.org
https://mail.python.org/mailman/listinfo/scikitlearn
 next part 
An HTML attachment was scrubbed...
URL:
From francois.dion at gmail.com Sat Aug 14 09:52:00 2021
From: francois.dion at gmail.com (Francois Dion)
Date: Sat, 14 Aug 2021 09:52:00 0400
Subject: [scikitlearn] random forests and multilclass probability
InReplyTo: <7D53A0FDEB5E4C27966BD6954EEF7398@gmail.com>
References: <7D53A0FDEB5E4C27966BD6954EEF7398@gmail.com>
MessageID:
Yellowbrick has multi label precision recall curves and multiclass roc/auc builtin:
https://www.scikityb.org/en/latest/api/classifier/rocauc.html
Sent from my iPad
> On Jul 27, 2021, at 6:03 AM, Guillaume Lema?tre wrote:
>
> ?As far that I remember, `precision_recall_curve` and `roc_curve` do not support multi class. They are design to work only with binary classification.
> Then, we provide an example for precisionrecall that shows one way to compute precisionrecall curve via averaging: https://scikitlearn.org/stable/auto_examples/model_selection/plot_precision_recall.html#sphxglrautoexamplesmodelselectionplotprecisionrecallpy
> 
> Guillaume Lemaitre
> Scikitlearn @ Inria Foundation
> https://glemaitre.github.io/
>
>> On 27 Jul 2021, at 11:42, Sole Galli via scikitlearn wrote:
>>
>> Thank you!
>>
>> So when in the multiclass document says that for the algorithms that support intrinsically multiclass, which are listed here, when it says that they do not need to be wrapped by the OnevsRest, it means that there is no need, because they can indeed handle multi class, each one in their own way.
>>
>> But, if I want to plot PR curves or ROC curves, then I do need to wrap them because those metrics are calculated as a 1 vs rest manner, and this is not how it is handled by the algos. Is my understanding correct?
>>
>> Thank you!
>>
>> ??????? Original Message ???????
>> On Tuesday, July 27th, 2021 at 11:33 AM, Nicolas Hug wrote:
>>> To add to Guillaume's answer: the native multiclass support for forests/trees is described here: https://scikitlearn.org/stable/modules/tree.html#multioutputproblems
>>>
>>> It's not a onevsrest strategy and can be summed up as:
>>>
>>>
>>>> Store n output values in leaves, instead of 1;
>>>>
>>>> Use splitting criteria that compute the average reduction across all n outputs.
>>>>
>>>
>>>
>>> Nicolas
>>>
>>> On 27/07/2021 10:22, Guillaume Lema?tre wrote:
>>>>>> On 27 Jul 2021, at 11:08, Sole Galli via scikitlearn wrote:
>>>>>>
>>>>>> Hello community,
>>>>>>
>>>>>> Do I understand correctly that Random Forests are trained as a 1 vs rest when the target has more than 2 classes? Say the target takes values 0, 1 and 2, then the model would train 3 estimators 1 per class under the hood?.
>>>>> Each decision tree of the forest is natively supporting multi class.
>>>>>
>>>>> The predict_proba output is an array with 3 columns, containing the probability of each class. If it is 1 vs rest. am I correct to assume that the sum of the probabilities for the 3 classes should not necessarily add up to 1? are they normalized? how is it done so that they do add up to 1?
>>>> According to the above answer, the sum for each row of the array given by `predict_proba` will sum to 1.
>>>> According to the documentation, the probabilities are computed as:
>>>>
>>>> The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.
>>>>
>>>>> Thank you
>>>>> Sole
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> scikitlearn mailing list
>>>>> scikitlearn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>> _______________________________________________
>>>> scikitlearn mailing list
>>>> scikitlearn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
 next part 
An HTML attachment was scrubbed...
URL:
From fernando.wittmann at gmail.com Sat Aug 14 10:04:24 2021
From: fernando.wittmann at gmail.com (Fernando Marcos Wittmann)
Date: Sat, 14 Aug 2021 11:04:24 0300
Subject: [scikitlearn] Regarding negative value of
sklearn.metrics.r2_score and sklearn.metrics.explained_variance_score
InReplyTo:
References:
<7d546c0043ef430cb8e0b046eb4748d6@Spark>
MessageID:
Hi Samir, the following visualization might be useful for gaining intuition
on the meaning of a negative r2:
https://gist.github.com/WittmannF/02060b45ce3ec9239898a5b91df2564e
A negative r2 is reflects into a model predicting the opposite trend of the
data.
On Sat, Aug 14, 2021, 03:17 Samir K Mahajan
wrote:
> Dear Chrisophe,
> I think you are oversimplifying by saying econometrics tools are for
> inference. Forecasting and prediction are integral parts of econometric
> analysis. Econometricians forecast by inferring the right conclusion
> about the model . I wish to convey to you that I teach both
> statistics and econometrics, and am now learning ML. There is a
> fundamental difference among statistics, econometrics and machine
> learning.
> Regards,
>
> Samir K Mahajan
>
> On Fri, Aug 13, 2021 at 3:39 PM Christophe Pallier
> wrote:
>
>> Indeed , this is basically what I told you (you do not be need to copy
>> textbook stuff: I taught probas/stats) : these are mostly problems for
>> *inference*.
>>
>> On Fri, 13 Aug 2021, 12:03 Samir K Mahajan,
>> wrote:
>>
>>>
>>> Dear Christophe Pallier*,*
>>>
>>> When we are doing prediction, we are relying on the values of the
>>> coefficients of the model created. We are feeding test data on the model
>>> for prediction. We may be nterested to see if the OLS
>>> estimators(coefficients) are BLUE or not. In the presence of
>>> autocorrelation (normally noticed in time series data), residuals are not
>>> independent, and as such the OLS estimators are not BLUE in the sense that
>>> they don't have minimum variance, and thus no more efficient estimators.
>>> Statistical tests (t, F and *?*2) may not be valid. We may reject the
>>> model to make predictions in such a situation. . We have to rely upon
>>> other improved models. There may be issues relating to multicollinearity
>>> (in case of multivariable regression model) and heteroscedasticity (mostly
>>> seen in crosssection data) too in a model. Can we discard these tools
>>> while predicting a model?
>>>
>>> Regards,
>>>
>>> Samir K Mahajan
>>>
>>>
>>> On Fri, Aug 13, 2021 at 1:07 PM Christophe Pallier <
>>> christophe at pallier.org> wrote:
>>>
>>>> Actually, multicollinearity and autocorrelation are problems for
>>>> *inference* more than for *prediction*. For example, if there is
>>>> autocorrelation, the residuals are not independent, and the degrees of
>>>> freedom are wrong for the tests in an OLS model (but you can use, e.g., an
>>>> AR1 model).
>>>>
>>>> On Thu, 12 Aug 2021, 22:32 Samir K Mahajan, <
>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>
>>>>> A note please (to Sebastian Raschka, mrschots).
>>>>>
>>>>>
>>>>> The OLS model that I used ( where the test score gave me a
>>>>> negative value) was not a good fit. Initial findings showed that t*he
>>>>> regression coefficients and the model as a whole were significant, *yet
>>>>> , finally , it failed in two econometrics tests such as VIF (used for
>>>>> detecting multicollinearity ) and DurbinWatson test ( used for detecting
>>>>> autocorrelation). *Presence of multicollinearity and
>>>>> autocorrelation problems * in the model make it unsuitable for
>>>>> prediction.
>>>>> Regards,
>>>>>
>>>>> Samir K Mahajan.
>>>>>
>>>>> On Fri, Aug 13, 2021 at 1:41 AM Samir K Mahajan <
>>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>>
>>>>>> Thanks to all of you for your kind response. Indeed, it is a
>>>>>> great learning experience. Yes, econometrics books too create models for
>>>>>> prediction, and programming really makes things better in a complex
>>>>>> world. My understanding is that machine learning does depend on
>>>>>> econometrics too.
>>>>>>
>>>>>> My Regards,
>>>>>>
>>>>>> Samir K Mahajan
>>>>>>
>>>>>> On Fri, Aug 13, 2021 at 1:21 AM Sebastian Raschka <
>>>>>> mail at sebastianraschka.com> wrote:
>>>>>>
>>>>>>> The R2 function in scikitlearn works fine. A negative means that
>>>>>>> the regression model fits the data worse than a horizontal line
>>>>>>> representing the sample mean. E.g. you usually get that if you are
>>>>>>> overfitting the training set a lot and then apply that model to the test
>>>>>>> set. The econometrics book probably didn't cover applying a model to an
>>>>>>> independent data or test set, hence the [0, 1] suggestion.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Sebastian
>>>>>>>
>>>>>>>
>>>>>>> On Aug 12, 2021, 2:20 PM 0500, Samir K Mahajan <
>>>>>>> samirkmahajan1972 at gmail.com>, wrote:
>>>>>>>
>>>>>>>
>>>>>>> Dear Christophe Pallier, Reshama Saikh and Tromek Drabas,
>>>>>>> Thank you for your kind response. Fair enough. I go with you R2 is
>>>>>>> not a square. However, if you open any book of econometrics, it says R2
>>>>>>> is a ratio that lies between 0 and 1. *This is the constraint.*
>>>>>>> It measures the proportion or percentage of the total variation in
>>>>>>> response variable (Y) explained by the regressors (Xs) in the model .
>>>>>>> Remaining proportion of variation in Y, if any, is explained by the
>>>>>>> residual term(u) Now, sklearn.matrics. metrics.r2_score gives me a negative
>>>>>>> value lying on a linear scale (5.763335245921777). This negative
>>>>>>> value breaks the *constraint.* I just want to highlight that. I
>>>>>>> think it needs to be corrected. Rest is up to you .
>>>>>>>
>>>>>>> I find that Reshama Saikh is hurt by my email. I am really sorry
>>>>>>> for that. Please note I never undermine your capabilities and initiatives.
>>>>>>> You are great people doing great jobs. I realise that I should have been
>>>>>>> more sensible.
>>>>>>>
>>>>>>> My regards to all of you.
>>>>>>>
>>>>>>> Samir K Mahajan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 12, 2021 at 12:02 PM Christophe Pallier <
>>>>>>> christophe at pallier.org> wrote:
>>>>>>>
>>>>>>>> Simple: despite its name R2 is not a square. Look up its definition.
>>>>>>>>
>>>>>>>> On Wed, 11 Aug 2021, 21:17 Samir K Mahajan, <
>>>>>>>> samirkmahajan1972 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Dear All,
>>>>>>>>> I am amazed to find negative values of sklearn.metrics.r2_score
>>>>>>>>> and sklearn.metrics.explained_variance_score in a model ( cross validation
>>>>>>>>> of OLS regression model)
>>>>>>>>> However, what amuses me more is seeing you justifying negative
>>>>>>>>> 'sklearn.metrics.r2_score ' in your documentation. This does not
>>>>>>>>> make sense to me . Please justify to me how squared values are negative.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Samir K Mahajan.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> scikitlearn mailing list
>>>>>>>>> scikitlearn at python.org
>>>>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> scikitlearn mailing list
>>>>>>>> scikitlearn at python.org
>>>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikitlearn mailing list
>>>>>>> scikitlearn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> scikitlearn mailing list
>>>>>>> scikitlearn at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>>>
>>>>>> _______________________________________________
>>>>> scikitlearn mailing list
>>>>> scikitlearn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>>
>>>> _______________________________________________
>>>> scikitlearn mailing list
>>>> scikitlearn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>>
>>> _______________________________________________
>>> scikitlearn mailing list
>>> scikitlearn at python.org
>>> https://mail.python.org/mailman/listinfo/scikitlearn
>>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From adrin.jalali at gmail.com Mon Aug 16 05:56:57 2021
From: adrin.jalali at gmail.com (Adrin)
Date: Mon, 16 Aug 2021 11:56:57 +0200
Subject: [scikitlearn] Pandas copyonwrite proposal
MessageID:
Hi there,
I'd like to bring your attention to a proposal being discussed among pandas
developers, regarding copyonwrite semantics.
A very short summary of the proposal, according to the document
,
is:
* The result of any indexing operation (subsetting a DataFrame or Series
in any way, i.e. including accessing a DataFrame column as a Series) or any
method returning a new DataFrame or Series, always behaves as if it were a
copy in terms of user API. We implement CopyonWrite (as implementation
detail). This way, we can actually use views as much as possible under the
hood, while ensuring the user API behaves as a copy.*
* As a consequence, if you want to modify an object (DataFrame or Series),
the only way to do this is to modify that object itself directly.*
*This addresses multiple aspects: 1) a clear and consistent user API (a
clear rule: any subset or returned series/dataframe always behaves as a
copy of the original, and thus never modifies the original) and 2)
improving performance by avoiding excessive copies (eg a chained method
workflow would no longer return an actual data copy at each step). Because
every single indexing step behaves as a copy, this also means that with
this proposal, ?chained assignment? (with multiple setitem steps) will
never work.*
You can also read the related discussion on the pandas mailing list here
. It
would be nice for us to think about the implications of this proposal on
our work related to supporting pandas dataframes.
Cheers,
Adrin
 next part 
An HTML attachment was scrubbed...
URL:
From petrizzo at gmail.com Mon Aug 16 17:30:33 2021
From: petrizzo at gmail.com (Mariangela Petrizzo)
Date: Mon, 16 Aug 2021 17:30:33 0400
Subject: [scikitlearn] Spanish translation proposal for ScikitLearn
documentation
InReplyTo:
References:
MessageID: <371E9F54EB4F45C7AE1907E1E769BC40@getmailspring.com>
Hello everyone!
We are writing briefly to announce that the Spanish translation of the Scikit learn 0.24.2 documentation is now available from:
https://qu4nt.github.io/sklearndoces/index.html
Soon we will update in that repository the suggested workflow for future translations of this documentation. We are now in the final phase of this work, debugging and finetuning the last details, but we update the html version daily.
It has been a great pleasure for our team to support the Spanish community of users of this library and the Python community in general, with our work.
Mari?ngela Petrizzo
http://qu4nt.com
Mar?a ?ngela Petrizzo P?ez About Me (about.me/petrizzo)
Desc?rgate Redes para la Comprensi?n de la Pol?tica (http://www.elperroylarana.gob.ve/redesparalacomprensiondelapolitica/)
Usuario Linux # 498889
Miembro Red de Polit?logas  #NoSinMujeres (https://www.nosinmujeres.com/)
Publicaciones (https://hotelescuela.academia.edu/MariangelaPetrizzoPaez)
ORCID (http://orcid.org/0000000194834185)
PEII  Nivel B
On feb. 9 2021, at 4:15 pm, Mariangela Petrizzo wrote:
> Dear ScikitLearn team!
>
>
>
> I am Mari?ngela Petrizzo, I am writing to you as a member of Qu4nt, a team dedicated to the use of open source tools for the development of software solutions with emphasis on data science. We have a strong interest in translating the ScikitLearn documentation into Spanish.
> Our team is made up of members from various scientific fields, including some university faculty in linguistics and computer sciences, with a wide experience in Python as well as several libraries used for data analysis and machine learning, and also contribute locally as evangelists of its use in Spanishspeaking communities, in particular, the leader initiated the translation of some Software Carpentry lessons into Spanish.
> That is why we have been discussing the opportunity to offer our contribution to the Python project, promoting the translation into Spanish of the documentation of some of the libraries with the greatest impact in our areas of interest. Talking with David Mertz, to whom we are sending a copy of this email, we have explored options, and the idea of working with Scikitlearn has really seemed to be an exceptional opportunity for all of us and the community. He's very enthusiastic about the idea of generating a spanish translation of Scientific Python libraries like Scikitlearn.
> For us, this translation project has to be done through a completely open work on Github, taking as reference the restructured text sources for Sphinx from a git fork, using the tools provided by Sphinx itself for internationalization: https://www.sphinxdoc.org/en/1.8/intl.html, and applying tags to perform planned updates. In addition, as with any open source project, the main mechanism for quality assurance comes from the users themselves who will have the channels available for submitting issues. Our intention is to secure all the infrastructure and mechanisms to make this possible: making the process transparent through Github, using as much as possible tools like Transifex to facilitate participation, and providing guidelines for contributors as part of the project.
> Of course, this project cannot be realized without your support. We therefore come to you to inquire about your willingness to accompany and support this project.
> We would love to hear your feedback on our proposal.
> Best regards,
>
>
> Mari?ngela
>
>
> 
>
> Mar?a ?ngela Petrizzo P?ez
>
> about.me/petrizzo (https://about.me/petrizzo?promo=email_sig&utm_source=product&utm_medium=email_sig&utm_campaign=edit_panel&utm_content=plaintext)
>
>
>
>
>
>
>
>
>
>
>
>
> Desc?rgate Redes para la Comprensi?n de la Pol?tica (http://www.elperroylarana.gob.ve/redesparalacomprensiondelapolitica/)
>
>
>
> > A quienes conservan la esperanza que no es lo ?ltimo que se pierde, sino lo primero que se siembra y, por tanto, lo m?s radical.
>
>
> El ?nico modo de vencer el secuestro del conocimiento
> es comprender sus razones.
> La manera de revertirlo,
> es hacernos hackers de los secuestros cotidianos
> a cambio de no morir sin saber lo que somos
>
> ?Piensa para vivir,
> act?a para hackear!
> Cada d?a, una acci?n procom?n a la vez.
>
> > ?Tengo horror de aquellos cuyas palabras van m?s all? que sus actos?
> > Albert Camus
> >
> > ?El poder, lejos de estorbar al saber, lo produce.?  Michael Foucault
> Usuario Linux # 498889
> Miembro Red de Polit?logas  #NoSinMujeres (http://www.nosinmujeres.com/)
> https://hotelescuela.academia.edu/MariangelaPetrizzoPaez
> http://orcid.org/0000000194834185
> PEII  Nivel B
>
>
>
>
>
>
>
>
>
>
>
>
 next part 
An HTML attachment was scrubbed...
URL:
From reshama.stat at gmail.com Tue Aug 17 09:03:08 2021
From: reshama.stat at gmail.com (Reshama Shaikh)
Date: Tue, 17 Aug 2021 09:03:08 0400
Subject: [scikitlearn] Spanish translation proposal for ScikitLearn
documentation
InReplyTo: <371E9F54EB4F45C7AE1907E1E769BC40@getmailspring.com>
References:
<371E9F54EB4F45C7AE1907E1E769BC40@getmailspring.com>
MessageID:
Hi Mari?ngela,
That's an impressive accomplishment! Congratulations.
A PR can be submitted to add the Spanish translation link to this page in
scikitlearn documentation:
https://scikitlearn.org/dev/related_projects.html#translationsofscikitlearndocumentation
Reshama Shaikh
she/her
Blog  Twitter
 LinkedIn  GitHub
Data Umbrella
NYC PyLadies
On Mon, Aug 16, 2021 at 5:32 PM Mariangela Petrizzo
wrote:
>
> Hello everyone!
>
> We are writing briefly to announce that the Spanish translation of the
> Scikit learn 0.24.2 documentation is now available from:
>
> https://qu4nt.github.io/sklearndoces/index.html
>
> Soon we will update in that repository the suggested workflow for future
> translations of this documentation. We are now in the final phase of this
> work, debugging and finetuning the last details, but we update the html
> version daily.
>
> It has been a great pleasure for our team to support the Spanish community
> of users of this library and the Python community in general, with our work.
>
>
> Mari?ngela Petrizzo
> http://qu4nt.com
>
> Mar?a ?ngela Petrizzo P?ezAbout Me
> Desc?rgate Redes para la Comprensi?n de la Pol?tica
>
> Usuario Linux # 498889
> Miembro Red de Polit?logas  #NoSinMujeres
> Publicaciones
> ORCID PEII  Nivel B
> On feb. 9 2021, at 4:15 pm, Mariangela Petrizzo
> wrote:
>
> Dear ScikitLearn team!
>
>
>
> I am Mari?ngela Petrizzo, I am writing to you as a member of Qu4nt, a team
> dedicated to the use of open source tools for the development of software
> solutions with emphasis on data science. We have a strong interest in
> translating the ScikitLearn documentation into Spanish.
>
> Our team is made up of members from various scientific fields, including
> some university faculty in linguistics and computer sciences, with a wide
> experience in Python as well as several libraries used for data analysis
> and machine learning, and also contribute locally as evangelists of its
> use in Spanishspeaking communities, in particular, the leader initiated
> the translation of some Software Carpentry lessons into Spanish.
>
> That is why we have been discussing the opportunity to offer our
> contribution to the Python project, promoting the translation into Spanish
> of the documentation of some of the libraries with the greatest impact in
> our areas of interest. Talking with David Mertz, to whom we are sending a
> copy of this email, we have explored options, and the idea of working with
> Scikitlearn has really seemed to be an exceptional opportunity for all of
> us and the community. He's very enthusiastic about the idea of generating a
> spanish translation of Scientific Python libraries like Scikitlearn.
>
> For us, this translation project has to be done through a completely open
> work on Github, taking as reference the restructured text sources for
> Sphinx from a git fork, using the tools provided by Sphinx itself for
> internationalization: https://www.sphinxdoc.org/en/1.8/intl.html
> , and applying tags to
> perform planned updates. In addition, as with any open source project, the
> main mechanism for quality assurance comes from the users themselves who
> will have the channels available for submitting issues. Our intention is to
> secure all the infrastructure and mechanisms to make this possible: making
> the process transparent through Github, using as much as possible tools
> like Transifex to facilitate participation, and providing guidelines for
> contributors as part of the project.
>
> Of course, this project cannot be realized without your support. We
> therefore come to you to inquire about your willingness to accompany and
> support this project.
>
> We would love to hear your feedback on our proposal.
>
> Best regards,
>
>
>
> Mari?ngela
>
>
> 
>
>
>
> Mar?a ?ngela Petrizzo P?ez
> [image: https://]
> [image: https://]about.me/petrizzo
>
> Desc?rgate Redes para la Comprensi?n de la Pol?tica
>
>
> *A quienes conservan la esperanza que no es lo ?ltimo que se pierde, sino
> lo primero que se siembra y, por tanto, lo m?s radical.*
>
>
> El ?nico modo de vencer el secuestro del conocimiento
> es comprender sus razones.
> La manera de revertirlo,
> es hacernos hackers de los secuestros cotidianos
> a cambio de no morir sin saber lo que somos
>
> ?Piensa para vivir,
> act?a para hackear!
> Cada d?a, una acci?n procom?n a la vez.
>
>
> *?Tengo horror de aquellos cuyas palabras van m?s all? que sus actos?*
> *Albert Camus*
>
> *?El poder, lejos de estorbar al saber, lo produce.?  Michael Foucault*
>
>
> Usuario Linux # 498889
> Miembro Red de Polit?logas  #NoSinMujeres
> https://hotelescuela.academia.edu/MariangelaPetrizzoPaez
> http://orcid.org/0000000194834185
> PEII  Nivel B
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From johngrenci61 at yahoo.com Fri Aug 20 18:15:09 2021
From: johngrenci61 at yahoo.com (John Grenci)
Date: Fri, 20 Aug 2021 22:15:09 +0000 (UTC)
Subject: [scikitlearn] cant install scikitlearn
References: <1717831625.594362.1629497709231.ref@mail.yahoo.com>
MessageID: <1717831625.594362.1629497709231@mail.yahoo.com>
Hello, hoping somebody can help me.
?
I have tried.. what seems like everything.
?
I get an OS error
?
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'C:\\Users\\ameri\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\\LocalCache\\localpackages\\Python39\\sitepackages\\sklearn\\datasets\\tests\\data\\openml\\292\\apiv1jsondatalistdata_nameaustralianlimit2data_version1statusdeactivated.json.gz'
HINT: This error might have occurred since this system does not have Windows Long Path support enabled. You can find information on how to enable this at?https://pip.pypa.io/warnings/enablelongpaths
?
?
I tried enabling more than 260 characters as suggested, but that did not help? gave me a different error actually.
?
I don?t think it has to do with bits, as my computer is 64 bit.
I also tried pip install sklearn
?
I am at a loss at this point.
?
PS I am ?not the most techy of person.? also, looked everywhere online that I could
can somebody help?
?
Thanks
John
?
 next part 
An HTML attachment was scrubbed...
URL:
From mablue92 at gmail.com Sun Aug 22 02:17:05 2021
From: mablue92 at gmail.com (Masoud Azizi)
Date: Sun, 22 Aug 2021 10:47:05 +0430
Subject: [scikitlearn] how the skpot optimize avoids flats
MessageID:
Hi to all Im new in sk mailing list :) I need your help about that
how hyperoption avoids this flat places?
is there a code address to findout that?
see the attachment
 next part 
An HTML attachment was scrubbed...
URL:
 next part 
A nontext attachment was scrubbed...
Name: unnamed.png
Type: image/png
Size: 21512 bytes
Desc: not available
URL:
From skacanski at gmail.com Sun Aug 22 16:24:45 2021
From: skacanski at gmail.com (Sasha Kacanski)
Date: Sun, 22 Aug 2021 16:24:45 0400
Subject: [scikitlearn] cant install scikitlearn
InReplyTo: <1717831625.594362.1629497709231@mail.yahoo.com>
References: <1717831625.594362.1629497709231.ref@mail.yahoo.com>
<1717831625.594362.1629497709231@mail.yahoo.com>
MessageID:
Who about Linux desktop for a change. i suggest Debian or Arch!
On Fri, Aug 20, 2021 at 6:17 PM John Grenci via scikitlearn <
scikitlearn at python.org> wrote:
> Hello, hoping somebody can help me.
>
>
>
> I have tried.. what seems like everything.
>
>
>
> I get an OS error
>
>
>
> ERROR: Could not install packages due to an OSError: [Errno 2] No such
> file or directory:
> 'C:\\Users\\ameri\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\\LocalCache\\localpackages\\Python39\\sitepackages\\sklearn\\datasets\\tests\\data\\openml\\292\\apiv1jsondatalistdata_nameaustralianlimit2data_version1statusdeactivated.json.gz'
>
> HINT: This error might have occurred since this system does not have
> Windows Long Path support enabled. You can find information on how to
> enable this at https://pip.pypa.io/warnings/enablelongpaths
>
>
>
>
>
> I tried enabling more than 260 characters as suggested, but that did not
> help gave me a different error actually.
>
>
> I don?t think it has to do with bits, as my computer is 64 bit.
>
> I also tried pip install sklearn
>
>
> I am at a loss at this point.
>
>
> PS I am not the most techy of person. also, looked everywhere online
> that I could
>
> can somebody help?
>
>
> Thanks
>
> John
>
>
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>

Aleksandar Kacanski  Sasha
 next part 
An HTML attachment was scrubbed...
URL:
From rdslater at gmail.com Sun Aug 22 16:42:10 2021
From: rdslater at gmail.com (Robert Slater)
Date: Sun, 22 Aug 2021 15:42:10 0500
Subject: [scikitlearn] cant install scikitlearn
InReplyTo: <1717831625.594362.1629497709231@mail.yahoo.com>
References: <1717831625.594362.1629497709231.ref@mail.yahoo.com>
<1717831625.594362.1629497709231@mail.yahoo.com>
MessageID:
What was the second error?
What version of python are you using?What version of windows are you using?
This will help troubleshoot the issue.
On Fri, Aug 20, 2021, 5:16 PM John Grenci via scikitlearn <
scikitlearn at python.org> wrote:
> Hello, hoping somebody can help me.
>
>
>
> I have tried.. what seems like everything.
>
>
>
> I get an OS error
>
>
>
> ERROR: Could not install packages due to an OSError: [Errno 2] No such
> file or directory:
> 'C:\\Users\\ameri\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\\LocalCache\\localpackages\\Python39\\sitepackages\\sklearn\\datasets\\tests\\data\\openml\\292\\apiv1jsondatalistdata_nameaustralianlimit2data_version1statusdeactivated.json.gz'
>
> HINT: This error might have occurred since this system does not have
> Windows Long Path support enabled. You can find information on how to
> enable this at https://pip.pypa.io/warnings/enablelongpaths
>
>
>
>
>
> I tried enabling more than 260 characters as suggested, but that did not
> help gave me a different error actually.
>
>
> I don?t think it has to do with bits, as my computer is 64 bit.
>
> I also tried pip install sklearn
>
>
> I am at a loss at this point.
>
>
> PS I am not the most techy of person. also, looked everywhere online
> that I could
>
> can somebody help?
>
>
> Thanks
>
> John
>
>
>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From thomasjpfan at gmail.com Sun Aug 22 17:11:22 2021
From: thomasjpfan at gmail.com (Thomas J. Fan)
Date: Sun, 22 Aug 2021 17:11:22 0400
Subject: [scikitlearn] cant install scikitlearn
InReplyTo:
References: <1717831625.594362.1629497709231.ref@mail.yahoo.com>
<1717831625.594362.1629497709231@mail.yahoo.com>
MessageID:
Here are instructions on how to resolve the issue:
https://scikitlearn.org/stable/install.html#errorcausedbyfilepathlengthlimitonwindows
In the upcoming release of scikitlearn, we have reduced the number of
characters in the filename. This should resolve this issue without needing
to edit the Windows registry.
Thomas
On Sun, Aug 22, 2021 at 4:44 PM Robert Slater wrote:
> What was the second error?
>
> What version of python are you using?What version of windows are you using?
>
>
> This will help troubleshoot the issue.
>
>
> On Fri, Aug 20, 2021, 5:16 PM John Grenci via scikitlearn <
> scikitlearn at python.org> wrote:
>
>> Hello, hoping somebody can help me.
>>
>>
>>
>> I have tried.. what seems like everything.
>>
>>
>>
>> I get an OS error
>>
>>
>>
>> ERROR: Could not install packages due to an OSError: [Errno 2] No such
>> file or directory:
>> 'C:\\Users\\ameri\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\\LocalCache\\localpackages\\Python39\\sitepackages\\sklearn\\datasets\\tests\\data\\openml\\292\\apiv1jsondatalistdata_nameaustralianlimit2data_version1statusdeactivated.json.gz'
>>
>> HINT: This error might have occurred since this system does not have
>> Windows Long Path support enabled. You can find information on how to
>> enable this at https://pip.pypa.io/warnings/enablelongpaths
>>
>>
>>
>>
>>
>> I tried enabling more than 260 characters as suggested, but that did not
>> help gave me a different error actually.
>>
>>
>> I don?t think it has to do with bits, as my computer is 64 bit.
>>
>> I also tried pip install sklearn
>>
>>
>> I am at a loss at this point.
>>
>>
>> PS I am not the most techy of person. also, looked everywhere online
>> that I could
>>
>> can somebody help?
>>
>>
>> Thanks
>>
>> John
>>
>>
>>
>> _______________________________________________
>> scikitlearn mailing list
>> scikitlearn at python.org
>> https://mail.python.org/mailman/listinfo/scikitlearn
>>
> _______________________________________________
> scikitlearn mailing list
> scikitlearn at python.org
> https://mail.python.org/mailman/listinfo/scikitlearn
>
 next part 
An HTML attachment was scrubbed...
URL:
From varavind121 at yahoo.com Sun Aug 22 17:13:23 2021
From: varavind121 at yahoo.com (aravind ramesh)
Date: Sun, 22 Aug 2021 21:13:23 +0000 (UTC)
Subject: [scikitlearn] cant install scikitlearn
InReplyTo:
References: <1717831625.594362.1629497709231.ref@mail.yahoo.com>
<1717831625.594362.1629497709231@mail.yahoo.com>
MessageID: <1741730219.681482.1629666803984@mail.yahoo.com>
Hi,
Try using Anaconda Python distribution(Anaconda  Individual Edition) it comes with scikit learn, no hassle of dealing with any dependency issues.



  



 
Anaconda  Individual Edition
Anaconda's opensource Individual Edition is the easiest way to perform Python/R data science and machine learni...



On Monday, August 23, 2021, 01:56:42 AM GMT+5:30, Sasha Kacanski wrote:
Who about Linux desktop for a change. i suggest Debian or Arch!
On Fri, Aug 20, 2021 at 6:17 PM John Grenci via scikitlearn