[scikit-learn] Confidence and Prediction Intervals of Support Vector Regression

Sebastian Raschka se.raschka at gmail.com
Wed Mar 1 22:13:02 EST 2017


Glad to hear that it was at least a little bit helpful :) 
(haha, Efron and Tibshirani even have a whole ~500 pg book on bootstrap if you have the time and patience … :) https://www.crcpress.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317)

> On Mar 1, 2017, at 10:07 PM, Raga Markely <raga.markely at gmail.com> wrote:
> 
> No worries, Sebastian :) .. thank you very much for your help.. I learned a lot of new things from your site today.. it led me to some relevant chapters in "The Elements of Statistical Learning", which then led me to chapter 8 page 264 about non-parametric & parametric bootstrap.. 
> 
> I think I will just go with the non-parametric bootstrap for my problem.. similar to the bootstrap steps i mentioned earlier..
> 
> Thank you!
> Raga
> 
> On Wed, Mar 1, 2017 at 9:44 PM, Sebastian Raschka <mail at sebastianraschka.com> wrote:
> Hi, Raga,
> 
> > 1. Just to make sure I understand correctly, using the .632+ bootstrap method, the ACC_lower and ACC_upper are the lower and higher percentile of the ACC_h,i distribution?
> 
> phew, I am actually not sure anymore … I think it’s the percentile of the ACC_boot distribution, similar to the “classic” bootstrap but where ACC_boot got computed from weighted ACC_h,i and ACC_r,i
> 
> >  2. For regression algorithms, is there a recommended equation for the no-information rate gamma?
> 
> 
> Sorry, can’t be of much help here; I am not sure what the equivalent of the no-information rate for regression would be ...
> 
> 
> 
> > On Mar 1, 2017, at 5:39 PM, Raga Markely <raga.markely at gmail.com> wrote:
> >
> > Thanks a lot, Sebastian! Very nicely written.
> >
> > I have a few follow-up questions:
> > 1. Just to make sure I understand correctly, using the .632+ bootstrap method, the ACC_lower and ACC_upper are the lower and higher percentile of the ACC_h,i distribution?
> > 2. For regression algorithms, is there a recommended equation for the no-information rate gamma?
> > 3. I need to plot the confidence interval and prediction interval for my Support Vector Regression prediction (just to clarify these intervals, please see an analogy from linear model on slide 14: http://www2.stat.duke.edu/~tjl13/s101/slides/unit6lec3H.pdf) - can I derive the intervals from .632+ bootstrap method or is there a different way of getting these intervals?
> >
> > Thank you!
> > Raga
> >
> >
> > On Wed, Mar 1, 2017 at 3:13 PM, Sebastian Raschka <se.raschka at gmail.com> wrote:
> > Hi, Raga,
> > I have a short section on this here (https://sebastianraschka.com/blog/2016/model-evaluation-selection-part2.html#the-bootstrap-method-and-empirical-confidence-intervals) if it helps.
> >
> > Best,
> > Sebastian
> >
> > > On Mar 1, 2017, at 3:07 PM, Raga Markely <raga.markely at gmail.com> wrote:
> > >
> > > Hi everyone,
> > >
> > > I wonder if you could provide me with some suggestions on how to determine the confidence and prediction intervals of SVR? If you have suggestions for any machine learning algorithms in general, that would be fine too (doesn't have to be specific for SVR).
> > >
> > > So far, I have found:
> > > 1. Bootstrap: http://stats.stackexchange.com/questions/183230/bootstrapping-confidence-interval-from-a-regression-prediction
> > > 2. http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0048723&type=printable
> > > 3. ftp://ftp.esat.kuleuven.ac.be/sista/suykens/reports/10_156_v0.pdf
> > >
> > > But, I don't fully understand the details in #2 and #3 to the point that I can write a step by step code. If I use bootstrap method, I can get the confidence interval as follows?
> > > a. Draw bootstrap sample of size n
> > > b. Fit the SVR model (with hyperparameters chosen during model selection with grid search cv) to this bootstrap sample
> > > c. Use this model to predict the output variable y* from input variable X*
> > > d. Repeat step a-c for, for instance, 100 times
> > > e. Order the 100 values of y*, and determine, for instance, the 10th percentile and 90th percentile (if we are looking for 0.8 confidence interval)
> > > f. Repeat a-e for different values of X* to plot the prediction with confidence interval
> > >
> > > But, I don't know how to get the prediction interval from here.
> > >
> > > Thank you very much,
> > > Raga
> > > _______________________________________________
> > > scikit-learn mailing list
> > > scikit-learn at python.org
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn



More information about the scikit-learn mailing list