ENH: Efficient vectorized sampling without replacement

Hello, Numpy provides efficient, vectorized methods for generating random samples of an array with replacement. However, it lacks similar functionality for sampling *without replacement* in a vectorized manner. To address this limitation, I developed a function capable of performing this task, achieving approximately a 30x performance improvement over a basic Python loop for small sample sizes (and 2x performance improvement using numba). Could this functionality, or something similar, be integrated into numpy? See also this issue <https://github.com/numpy/numpy/issues/28084>. Kind regards, Mark def random_choice_without_replacement(array, sample_size, n_iterations): """ Generates random samples from a given array without replacement. Parameters ---------- array : array-like Array from which to draw the random samples. sample_size : int Number of random samples to draw without replacement per iteration. n_iterations : int Number of iterations to generate random samples. Returns ------- random_samples : ndarray The generated random samples. Raises ------ ValueError If sample_size is greater than the population size. Examples -------- Generate 10 random samples from np.arange(5) of size 3 without replacement. >>> array = np.arange(5) >>> random_choice_without_replacement(array, 3, 10) array([[4, 0, 1], [1, 4, 0], [1, 3, 2], [0, 1, 3], [1, 0, 2], [3, 2, 4], [0, 3, 1], [1, 3, 4], [3, 1, 4], [0, 1, 3]]) # random Generate 4 random samples from an n-dimensional array of size 3 without replacement. >>> array = np.arange(10).reshape(5, 2) >>> random_choice_without_replacement(array, 3, 4) array([[[0, 1], [8, 9], [4, 5]], [[2, 3], [8, 9], [0, 1]], [[0, 1], [2, 3], [8, 9]], [[4, 5], [2, 3], [8, 9]]]) # random """ if sample_size > len(array): raise ValueError(f"Sample_size ({sample_size}) is greater than the population size ({len(array)}).") indices = np.tile(np.arange(len(array)), (n_iterations,1)) random_samples = np.empty((n_iterations, sample_size), dtype=int) rng = np.random.default_rng() for i, int_max in zip(range(sample_size), reversed(range(len(array) - sample_size, len(array)))): random_indices = rng.integers(0, int_max + 1, size=(n_iterations,1)) random_samples[:, i] = np.take_along_axis(indices, random_indices, axis=-1).T np.put_along_axis(indices, random_indices, indices[:, int_max:int_max+1], axis=-1) return array[random_samples]
participants (1)
-
Mark