[scikit-learn] Confidence Estimation for Regressor Predictions

Dale T Smith Dale.T.Smith at macys.com
Fri Sep 2 08:21:27 EDT 2016


Roman,

Research in the 1970's that's not well known indicates that the bias for t-statistics, for instance, cancels out in the numerator and denominator. I should have written up something showing how to do the relevant statistical diagnostics for ridge regression, but got laid off an earlier job.

Lasso regression is a very different story.


__________________________________________________________________________________________
Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning
 | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com

-----Original Message-----
From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Roman Yurchak
Sent: Thursday, September 1, 2016 5:14 PM
To: Scikit-learn user and developer mailing list
Subject: Re: [scikit-learn] Confidence Estimation for Regressor Predictions

⚠ EXT MSG:

Dale, I meant for all the methods in scikit.linear_model. Linear regression is well known, but say for rigde regression that does not look that simple http://stats.stackexchange.com/a/15417 .
Thanks for mentioning the bootstrap method!

--
Roman

On 01/09/16 21:55, Dale T Smith wrote:
> Confidence intervals for linear models are well known - see any statistics book or look it up on Wikipedia. You should be able to calculate everything you need for a linear model just from the information the estimator provides. Note the Rsquared provided by linear_model appears to be what statisticians call the adjusted-Rsquared.
> 
> 
> ______________________________________________________________________
> ____________________ Dale Smith | Macy's Systems and Technology | IFS 
> eCommerce | Data Science and Capacity Planning  | 5985 State Bridge 
> Road, Johns Creek, GA 30097 | dale.t.smith at macys.com
> 
> 
> -----Original Message-----
> From: scikit-learn 
> [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On 
> Behalf Of Roman Yurchak
> Sent: Thursday, September 1, 2016 3:45 PM
> To: Scikit-learn user and developer mailing list
> Subject: Re: [scikit-learn] Confidence Estimation for Regressor 
> Predictions
> 
> ⚠ EXT MSG:
> 
> I'm also interested to know if there are any projects similar to scikit-learn-contrib/forest-confidence-interval for linear_model or SVM regressors.
> 
> In the general case, I think you could get a quick first order approximation of the confidence interval for your regressor, if you take the standard deviation  of predictions obtained by fitting different subsets of your data using,
>      cross_validation.cross_val_score( ).std() with a fixed set of estimator parameters? Or some multiple of it (e.g.
> 2*std). Though this will probably not match exactly the mathematical definition of a confidence interval.
> --
> Roman
> 
> 
> On 01/09/16 20:32, Dale T Smith wrote:
>> There is a scikit-learn-contrib project with confidence intervals for random forests.
>>
>> https://github.com/scikit-learn-contrib/forest-confidence-interval
>>
>>
>> _____________________________________________________________________
>> _____________________ Dale Smith | Macy's Systems and Technology | 
>> IFS eCommerce | Data Science and Capacity Planning  | 5985 State 
>> Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com
>>
>> -----Original Message-----
>> From: scikit-learn 
>> [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On 
>> Behalf Of Daniel Seeliger via scikit-learn
>> Sent: Thursday, September 1, 2016 2:28 PM
>> To: scikit-learn at python.org
>> Cc: Daniel Seeliger
>> Subject: [scikit-learn] Confidence Estimation for Regressor 
>> Predictions
>>
>> ⚠ EXT MSG:
>>
>> Dear all,
>>
>> For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction.
>>
>> Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR.
>>
>> Thanks a lot for your help,
>> Daniel
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments.
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments.
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 

_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn

* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments.


More information about the scikit-learn mailing list