On Thu, Sep 1, 2011 at 6:02 PM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote:
Hi--I've just submitted a numpy 2.0 pull request for a function sample in np.random. It's essentially an implementation of R's sample function. It allows possibly non-uniform, possibly without-replacement sampling from a given 1-D array-like. This is very useful for quickly and cleanly creating samples from, for example, a list of strings or a list of non-contiguous, non-evenly spaced integers. Both occur in data analysis with categorical data.
It is, essentially, a convenience function that wraps a number of existing ways to take a random sample. I think it belongs in numpy.random rather than scipy.stats because it's just a random sampler, rather than a probability distribution. It isn't possible to define a scipy.stats discrete random variable on strings--it would have to instead be done on the indices of the list containing the possible samples. And (as far as I can tell) the scipy.stats distributions can't be used for sampling without replacement.
I don't think you can kill numpy.random.random and similar mixed in with an adding a new function commit. First these functions would need to be deprecated. "it does not break the API as the previous function was not in the docs" This is a doc bug, I assume. I don't think it means users/developers don't rely on it. searching for np.random.random shows 120 threads in my gmail reader, python uses random.random() dir(np.random) shows it I copied it from mailing list examples. It's used quite a bit in scipy, as I saw because of your work. I also find the historical multiplicity of aliases confusing, but which names should be deprecated would at least require a discussion and a separate commit. Josef
-Chris JS _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion