[scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn

Jacob Vanderplas jakevdp at cs.washington.edu
Fri Feb 3 16:51:07 EST 2017


Hi Afarin,
The short answer is no, you can't really compute p-values and related
statistics in Scikit-Learn.

This stems from a fundamental divide in statistics/AI between machine
learning on one hand, and statistical modeling on the other. A classic
treatment of this divide is "Statistical Modeling: the Two Cultures" by Leo
Breiman.

In short, statistical modeling is about *estimating parameters of models*,
and in that context things like significance, p-values, etc. are relevant.
Machine learning is about *predicting outputs*, and generally treats models
and their parameters as a black box, the contents of which are not of any
explicit interest. As such, p-values and related statistics concerning
model parameters are not a concern.

Scikit-learn is firmly in the latter camp of Machine learning. Of course,
there is plenty of overlap between the two cultures, and the divide is
somewhat fuzzy in practice, but it's a useful way to frame the issue. If
you're interested in statistical modeling rather than machine learning (and
it sounds like you are), scikit-learn is not really the right tool. You
might check out the statsmodels <http://statsmodels.sourceforge.net/>
package,
   Jake

 Jake VanderPlas
 Senior Data Science Fellow
 Director of Research in Physical Sciences
 University of Washington eScience Institute

On Fri, Feb 3, 2017 at 12:53 PM, Afarin Famili <
Afarin.Famili at utsouthwestern.edu> wrote:

> Hi all,
>
> I am aiming at calculating the p-value of regression models using
> scikit-learn, in order to report their statistical significance. Aside from
> permutation_test_score in scikit-learn, do you have any suggestions for
> calculating the p-value of the model? Ultimately, I am interested in
> computing the coefficient of determination, r2 as well as MSE to indicate
> the performance of the model for those models that were statistically
> significant.
>
> Thank you,
>
> Afarin​
>
>>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170203/904a0941/attachment.html>


More information about the scikit-learn mailing list