Hi all,
I was trying to use the scipy.stats.bootstrap function and wasn't sure that
I was understanding the format of the `data` argument. Happy to make a PR
if I'm understanding things correctly and can improve the docs. The docs
say:
> Each element of data is a sample from an underlying distribution.
I think I'm confused about what the elements of `data` mean versus the
dimension of the arrays that make up the iterable `data`. It's also not
clear which things are assumed to be a "sample from an underlying
distribution", e.g., each element of `data` or the elements of the arrays
in `data`.
Based on the examples, it seems like bootstrap can be used in these
scenarios
- A 1d array of data, x, with a statistic that takes a single argument
like `np.mean`. In this case, data=(x,)
- A set of (paired) 1d arrays of data, (x, y), with a statistic that
takes multiple arguments, like `pearsonr`. In this case, data=(x, y)
I think this scenario is possible based on the examples, but I wasn't sure
based on the docs
- An Nd array of data, x, with a statistic that you want to compute
along an axis, and return bootstrap statistics for each element of the
other dimensions. For instance, if you want to bootstrap multiple datasets
at once. In this case, data=(x,) with x.ndim > 1.
I'm not clear if bootstrap can be used in this scenario
- `statistic` is implicitly computed along multiple axes, and requires a
vector of features per sample (e.g., R^2 from fitting a linear model from
bootstrapped multivariate samples)
*Is my understanding of these things correct?:*
- len(data) should be equal to the number of args required by `statistic`?
Is there any other reason to have len(data) > 1?
- Assuming `statistic` can be computed along an axis, it has to return an
array of statistics with exactly 1 fewer dimensions than the input arrays.
For instance, if x is 2d, np.mean(x, axis=-1) returns a 1d array of means
and is allowed, but multiclass logistic regression would take in 2d arrays
(x, y) but return a scalar (accuracy) which would not be allowed.
Thanks for any help,
Jesse
--
Jesse Livezey
he/him/his