Confidence Estimation for Regressor Predictions
Dear all, For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction. Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR. Thanks a lot for your help, Daniel
There is a scikit-learn-contrib project with confidence intervals for random forests. https://github.com/scikit-learn-contrib/forest-confidence-interval __________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith@macys.com -----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com@python.org] On Behalf Of Daniel Seeliger via scikit-learn Sent: Thursday, September 1, 2016 2:28 PM To: scikit-learn@python.org Cc: Daniel Seeliger Subject: [scikit-learn] Confidence Estimation for Regressor Predictions ⚠ EXT MSG: Dear all, For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction. Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR. Thanks a lot for your help, Daniel _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments.
I'm also interested to know if there are any projects similar to scikit-learn-contrib/forest-confidence-interval for linear_model or SVM regressors. In the general case, I think you could get a quick first order approximation of the confidence interval for your regressor, if you take the standard deviation of predictions obtained by fitting different subsets of your data using, cross_validation.cross_val_score( ).std() with a fixed set of estimator parameters? Or some multiple of it (e.g. 2*std). Though this will probably not match exactly the mathematical definition of a confidence interval. -- Roman On 01/09/16 20:32, Dale T Smith wrote:
There is a scikit-learn-contrib project with confidence intervals for random forests.
https://github.com/scikit-learn-contrib/forest-confidence-interval
__________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith@macys.com
-----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com@python.org] On Behalf Of Daniel Seeliger via scikit-learn Sent: Thursday, September 1, 2016 2:28 PM To: scikit-learn@python.org Cc: Daniel Seeliger Subject: [scikit-learn] Confidence Estimation for Regressor Predictions
⚠ EXT MSG:
Dear all,
For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction.
Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR.
Thanks a lot for your help, Daniel _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Confidence intervals for linear models are well known - see any statistics book or look it up on Wikipedia. You should be able to calculate everything you need for a linear model just from the information the estimator provides. Note the Rsquared provided by linear_model appears to be what statisticians call the adjusted-Rsquared. __________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith@macys.com -----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com@python.org] On Behalf Of Roman Yurchak Sent: Thursday, September 1, 2016 3:45 PM To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Confidence Estimation for Regressor Predictions ⚠ EXT MSG: I'm also interested to know if there are any projects similar to scikit-learn-contrib/forest-confidence-interval for linear_model or SVM regressors. In the general case, I think you could get a quick first order approximation of the confidence interval for your regressor, if you take the standard deviation of predictions obtained by fitting different subsets of your data using, cross_validation.cross_val_score( ).std() with a fixed set of estimator parameters? Or some multiple of it (e.g. 2*std). Though this will probably not match exactly the mathematical definition of a confidence interval. -- Roman On 01/09/16 20:32, Dale T Smith wrote:
There is a scikit-learn-contrib project with confidence intervals for random forests.
https://github.com/scikit-learn-contrib/forest-confidence-interval
__________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith@macys.com
-----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com@python.org] On Behalf Of Daniel Seeliger via scikit-learn Sent: Thursday, September 1, 2016 2:28 PM To: scikit-learn@python.org Cc: Daniel Seeliger Subject: [scikit-learn] Confidence Estimation for Regressor Predictions
⚠ EXT MSG:
Dear all,
For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction.
Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR.
Thanks a lot for your help, Daniel _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments.
Dale, I meant for all the methods in scikit.linear_model. Linear regression is well known, but say for rigde regression that does not look that simple http://stats.stackexchange.com/a/15417 . Thanks for mentioning the bootstrap method! -- Roman On 01/09/16 21:55, Dale T Smith wrote:
Confidence intervals for linear models are well known - see any statistics book or look it up on Wikipedia. You should be able to calculate everything you need for a linear model just from the information the estimator provides. Note the Rsquared provided by linear_model appears to be what statisticians call the adjusted-Rsquared.
__________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith@macys.com
-----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com@python.org] On Behalf Of Roman Yurchak Sent: Thursday, September 1, 2016 3:45 PM To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Confidence Estimation for Regressor Predictions
⚠ EXT MSG:
I'm also interested to know if there are any projects similar to scikit-learn-contrib/forest-confidence-interval for linear_model or SVM regressors.
In the general case, I think you could get a quick first order approximation of the confidence interval for your regressor, if you take the standard deviation of predictions obtained by fitting different subsets of your data using, cross_validation.cross_val_score( ).std() with a fixed set of estimator parameters? Or some multiple of it (e.g. 2*std). Though this will probably not match exactly the mathematical definition of a confidence interval. -- Roman
On 01/09/16 20:32, Dale T Smith wrote:
There is a scikit-learn-contrib project with confidence intervals for random forests.
https://github.com/scikit-learn-contrib/forest-confidence-interval
__________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith@macys.com
-----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com@python.org] On Behalf Of Daniel Seeliger via scikit-learn Sent: Thursday, September 1, 2016 2:28 PM To: scikit-learn@python.org Cc: Daniel Seeliger Subject: [scikit-learn] Confidence Estimation for Regressor Predictions
⚠ EXT MSG:
Dear all,
For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction.
Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR.
Thanks a lot for your help, Daniel _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi All, I am also interested in determining a confidence level associated with an SVM, or SVR prediction. Is there a nice way to generalize this confidence regardless of the kernel chosen, for the given SVM or SVR implementation? Last year I generally tried the 'predict_proba' method, which was not very good, when implemented generically: - https://github.com/jeff1evesque/machine-learning/issues/1924#issuecomment-15... The 'decision_function' performed a little better. But, are my examples poor, because the sample data is too small for accurate confidence measurements? Would both the 'decision_function', and 'predict_proba' improve if my dataset was much larger, or should I customize the latter methods? Feel free to make any comments on the above github issue. I've spent more time on the web tools of that repository, than understanding the fundamentals of predictions. Forgive me ahead of time. Thank you, Jeff Levesque https://github.com/jeff1evesque
On Sep 1, 2016, at 5:13 PM, Roman Yurchak <rth.yurchak@gmail.com> wrote:
Dale, I meant for all the methods in scikit.linear_model. Linear regression is well known, but say for rigde regression that does not look that simple http://stats.stackexchange.com/a/15417 . Thanks for mentioning the bootstrap method!
-- Roman
On 01/09/16 21:55, Dale T Smith wrote: Confidence intervals for linear models are well known - see any statistics book or look it up on Wikipedia. You should be able to calculate everything you need for a linear model just from the information the estimator provides. Note the Rsquared provided by linear_model appears to be what statisticians call the adjusted-Rsquared.
__________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith@macys.com
-----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com@python.org] On Behalf Of Roman Yurchak Sent: Thursday, September 1, 2016 3:45 PM To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Confidence Estimation for Regressor Predictions
⚠ EXT MSG:
I'm also interested to know if there are any projects similar to scikit-learn-contrib/forest-confidence-interval for linear_model or SVM regressors.
In the general case, I think you could get a quick first order approximation of the confidence interval for your regressor, if you take the standard deviation of predictions obtained by fitting different subsets of your data using, cross_validation.cross_val_score( ).std() with a fixed set of estimator parameters? Or some multiple of it (e.g. 2*std). Though this will probably not match exactly the mathematical definition of a confidence interval. -- Roman
On 01/09/16 20:32, Dale T Smith wrote: There is a scikit-learn-contrib project with confidence intervals for random forests.
https://github.com/scikit-learn-contrib/forest-confidence-interval
__________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith@macys.com
-----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com@python.org] On Behalf Of Daniel Seeliger via scikit-learn Sent: Thursday, September 1, 2016 2:28 PM To: scikit-learn@python.org Cc: Daniel Seeliger Subject: [scikit-learn] Confidence Estimation for Regressor Predictions
⚠ EXT MSG:
Dear all,
For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction.
Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR.
Thanks a lot for your help, Daniel _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
I do not know of any research related to any estimators except linear_model and forests of trees. Knowledge of the underlying distributions is required for confidence intervals. The Jackknife and bootstrap are the most common methods to obtain this information from the data. If anyone knows of these techniques applied more widely in machine learning to measure confidence intervals, please post the references. I think providing these measures in scikit-learn-contrib provides the entire project with features other packages don't have. Here's an example of the work done on the StatML side, "Distribution-Free Predictive Inference for Regression" http://www.stat.cmu.edu/~ryantibs/papers/conformal.pdf Note the use of leave-one-covariate-out to estimate variable importance. __________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith@macys.com -----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com@python.org] On Behalf Of Jeffrey Levesque via scikit-learn Sent: Friday, September 2, 2016 12:19 AM To: Scikit-learn user and developer mailing list Cc: Jeffrey Levesque Subject: Re: [scikit-learn] Confidence Estimation for Regressor Predictions ⚠ EXT MSG: Hi All, I am also interested in determining a confidence level associated with an SVM, or SVR prediction. Is there a nice way to generalize this confidence regardless of the kernel chosen, for the given SVM or SVR implementation? Last year I generally tried the 'predict_proba' method, which was not very good, when implemented generically: - https://github.com/jeff1evesque/machine-learning/issues/1924#issuecomment-15... The 'decision_function' performed a little better. But, are my examples poor, because the sample data is too small for accurate confidence measurements? Would both the 'decision_function', and 'predict_proba' improve if my dataset was much larger, or should I customize the latter methods? Feel free to make any comments on the above github issue. I've spent more time on the web tools of that repository, than understanding the fundamentals of predictions. Forgive me ahead of time. Thank you, Jeff Levesque https://github.com/jeff1evesque
On Sep 1, 2016, at 5:13 PM, Roman Yurchak <rth.yurchak@gmail.com> wrote:
Dale, I meant for all the methods in scikit.linear_model. Linear regression is well known, but say for rigde regression that does not look that simple http://stats.stackexchange.com/a/15417 . Thanks for mentioning the bootstrap method!
-- Roman
On 01/09/16 21:55, Dale T Smith wrote: Confidence intervals for linear models are well known - see any statistics book or look it up on Wikipedia. You should be able to calculate everything you need for a linear model just from the information the estimator provides. Note the Rsquared provided by linear_model appears to be what statisticians call the adjusted-Rsquared.
_____________________________________________________________________ _____________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | | dale.t.smith@macys.com
-----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com@python.org] On Behalf Of Roman Yurchak Sent: Thursday, September 1, 2016 3:45 PM To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Confidence Estimation for Regressor Predictions
⚠ EXT MSG:
I'm also interested to know if there are any projects similar to scikit-learn-contrib/forest-confidence-interval for linear_model or SVM regressors.
In the general case, I think you could get a quick first order approximation of the confidence interval for your regressor, if you take the standard deviation of predictions obtained by fitting different subsets of your data using, cross_validation.cross_val_score( ).std() with a fixed set of estimator parameters? Or some multiple of it (e.g. 2*std). Though this will probably not match exactly the mathematical definition of a confidence interval. -- Roman
On 01/09/16 20:32, Dale T Smith wrote: There is a scikit-learn-contrib project with confidence intervals for random forests.
https://github.com/scikit-learn-contrib/forest-confidence-interval
____________________________________________________________________ ______________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | | dale.t.smith@macys.com
-----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com@python.org] On Behalf Of Daniel Seeliger via scikit-learn Sent: Thursday, September 1, 2016 2:28 PM To: scikit-learn@python.org Cc: Daniel Seeliger Subject: [scikit-learn] Confidence Estimation for Regressor Predictions
⚠ EXT MSG:
Dear all,
For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction.
Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR.
Thanks a lot for your help, Daniel _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments.
Roman, Research in the 1970's that's not well known indicates that the bias for t-statistics, for instance, cancels out in the numerator and denominator. I should have written up something showing how to do the relevant statistical diagnostics for ridge regression, but got laid off an earlier job. Lasso regression is a very different story. __________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith@macys.com -----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com@python.org] On Behalf Of Roman Yurchak Sent: Thursday, September 1, 2016 5:14 PM To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Confidence Estimation for Regressor Predictions ⚠ EXT MSG: Dale, I meant for all the methods in scikit.linear_model. Linear regression is well known, but say for rigde regression that does not look that simple http://stats.stackexchange.com/a/15417 . Thanks for mentioning the bootstrap method! -- Roman On 01/09/16 21:55, Dale T Smith wrote:
Confidence intervals for linear models are well known - see any statistics book or look it up on Wikipedia. You should be able to calculate everything you need for a linear model just from the information the estimator provides. Note the Rsquared provided by linear_model appears to be what statisticians call the adjusted-Rsquared.
______________________________________________________________________ ____________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith@macys.com
-----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com@python.org] On Behalf Of Roman Yurchak Sent: Thursday, September 1, 2016 3:45 PM To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Confidence Estimation for Regressor Predictions
⚠ EXT MSG:
I'm also interested to know if there are any projects similar to scikit-learn-contrib/forest-confidence-interval for linear_model or SVM regressors.
In the general case, I think you could get a quick first order approximation of the confidence interval for your regressor, if you take the standard deviation of predictions obtained by fitting different subsets of your data using, cross_validation.cross_val_score( ).std() with a fixed set of estimator parameters? Or some multiple of it (e.g. 2*std). Though this will probably not match exactly the mathematical definition of a confidence interval. -- Roman
On 01/09/16 20:32, Dale T Smith wrote:
There is a scikit-learn-contrib project with confidence intervals for random forests.
https://github.com/scikit-learn-contrib/forest-confidence-interval
_____________________________________________________________________ _____________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith@macys.com
-----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com@python.org] On Behalf Of Daniel Seeliger via scikit-learn Sent: Thursday, September 1, 2016 2:28 PM To: scikit-learn@python.org Cc: Daniel Seeliger Subject: [scikit-learn] Confidence Estimation for Regressor Predictions
⚠ EXT MSG:
Dear all,
For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction.
Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR.
Thanks a lot for your help, Daniel _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments.
Maybe you can also use bootstrap method published by Efron? You can see https://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLII_print4.pdf It is implemented in resampling module with replacement option, if I can understand. J. Dne 1.9.2016 21:46 napsal uživatel "Roman Yurchak" <rth.yurchak@gmail.com>:
I'm also interested to know if there are any projects similar to scikit-learn-contrib/forest-confidence-interval for linear_model or SVM regressors.
In the general case, I think you could get a quick first order approximation of the confidence interval for your regressor, if you take the standard deviation of predictions obtained by fitting different subsets of your data using, cross_validation.cross_val_score( ).std() with a fixed set of estimator parameters? Or some multiple of it (e.g. 2*std). Though this will probably not match exactly the mathematical definition of a confidence interval. -- Roman
On 01/09/16 20:32, Dale T Smith wrote:
There is a scikit-learn-contrib project with confidence intervals for random forests.
https://github.com/scikit-learn-contrib/forest-confidence-interval
____________________________________________________________
Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith@macys.com
-----Original Message----- From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith= macys.com@python.org] On Behalf Of Daniel Seeliger via scikit-learn Sent: Thursday, September 1, 2016 2:28 PM To: scikit-learn@python.org Cc: Daniel Seeliger Subject: [scikit-learn] Confidence Estimation for Regressor Predictions
⚠ EXT MSG:
Dear all,
For classifiers I make use of the predict_proba method to compute a Gini coefficient or entropy to get an estimate of how "sure" the model is about an individual prediction.
Is there anything similar I could use for regression models? I guess for a RandomForest I could simply use the indiviual predictions of each tree in clf.estimators_ and compute a standard deviation but I guess this is not a generic approach I can use for other regressors like the GradientBoostingRegressor or a SVR.
Thanks a lot for your help, Daniel _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (5)
-
Dale T Smith -
Daniel Seeliger -
Jeffrey Levesque -
Jiří Fejfar -
Roman Yurchak