Re: [SciPy-user] Bootstrap?
![](https://secure.gravatar.com/avatar/937c833e374adb46b19fa5febb027c79.jpg?s=120&d=mm&r=g)
On Tue, Jul 7, 2009 at 6:28 AM, Joshua Stults<joshua.stults@gmail.com> wrote:
I was wondering if scipy had something similar to Octave/Matlab's empricial_rnd(). ?Here's the blurb from Octave's help describing the function:
?-- Function File: ?empirical_rnd (N, DATA) ?-- Function File: ?empirical_rnd (DATA, R, C) ?-- Function File: ?empirical_rnd (DATA, SZ) ? ? Generate a bootstrap sample of size N from the empirical ? ? distribution obtained from the univariate sample DATA.
? ? If R and C are given create a matrix with R rows and C columns. Or ? ? if SZ is a vector, create a matrix of size SZ.
So basically you pass it an array of data, and it returns bootstrap samples (resampling from the array with replacement).
Be very careful and be certain you can derive the statistical justification for what you are doing when you use bootstrap. There are numerous cases in which bootstrapping will not give you the right answer, such as when fitting a function that has a parameter that is set in just a small subset of the data, because in some samples the subset may be omitted completely or in large part, admitting wildly wrong parameter values. While you didn't specify exactly what you are trying to do, for many problems Markov-Chain Monte Carlo is both better and faster, and is often easier to code. Plus, there is Python for it (pymc, I think). --jh--
![](https://secure.gravatar.com/avatar/d5cf94ea9e23c84f28d7cd33abf63810.jpg?s=120&d=mm&r=g)
Joe, Thanks for the tip. On Tue, Jul 7, 2009 at 9:36 PM, Joe Harrington<jh@physics.ucf.edu> wrote:
On Tue, Jul 7, 2009 at 6:28 AM, Joshua Stults<joshua.stults@gmail.com> wrote:
I was wondering if scipy had something similar to Octave/Matlab's empricial_rnd(). ?Here's the blurb from Octave's help describing the function:
?-- Function File: ?empirical_rnd (N, DATA) ?-- Function File: ?empirical_rnd (DATA, R, C) ?-- Function File: ?empirical_rnd (DATA, SZ) ? ? Generate a bootstrap sample of size N from the empirical ? ? distribution obtained from the univariate sample DATA.
? ? If R and C are given create a matrix with R rows and C columns. Or ? ? if SZ is a vector, create a matrix of size SZ.
So basically you pass it an array of data, and it returns bootstrap samples (resampling from the array with replacement).
Be very careful and be certain you can derive the statistical justification for what you are doing when you use bootstrap. There are numerous cases in which bootstrapping will not give you the right answer, such as when fitting a function that has a parameter that is set in just a small subset of the data, because in some samples the subset may be omitted completely or in large part, admitting wildly wrong parameter values.
I was doing a toy problem with 0-1 data (1=success, 0=failure), estimating a reliability. So my statistic was just: sum(bootstrap_sample) / n. Does your criticism apply to bootstrapping the residuals too? I'd appreciate if you could point me towards any accessible (I'm not a statistician) references.
While you didn't specify exactly what you are trying to do, for many problems Markov-Chain Monte Carlo is both better and faster, and is often easier to code. Plus, there is Python for it (pymc, I think).
Could you give an example where it's easier to code an MCMC method? Doing a bootstrap is one or two lines of code in most high level languages (eg Matlab/Octave), and turns out Python too using the random indexing method that Josef and Ernest posted (of course you have to put it in an interpreted loop, which is not very scalable).
--jh-- _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
Thanks again, I've been consistently impressed by the quality of responses on this list. -- Joshua Stults Website: http://j-stults.blogspot.com
participants (2)
-
Joe Harrington
-
Joshua Stults