Hi all,


I have created a new PR (https://github.com/scipy/scipy/pull/17955) to add new features related to the generation of random variates. If you are interested, take a look at the PR and join the discussion there.

Background:

In `stats.sampling`, various methods are implemented to generate random variates based on a specified distribution (the methods are from the UNURAN C library, Tirth implemented this as part of GSoC 2022). The tools are mainly for advanced users and are not very easy to use. In a follow-up project (Numfocus SDG), I worked with one of the UNURAN authors to make these features more easily accessible: we created a config for a large number of the SciPy distributions (36 to be precise) such that `NumericalInversePolynomial` can be directly used w/o having to care about the settings. For example, to sample from the Gamma distribution, one can just write

rng = stats.sampling.FastGeneratorInversion('gamma', (1.3,))
rng.rvs(size=10)

Further supported features:

- a fast inverse CDF `ppf_fast` that is computed when the object is created
- sampling with quasi-random numbers from `stats.qmc` (`qrvs`)
- truncation: all distributions can be truncated/restricted to an interval using the keyword `domain` (so in addition to the 36 distributions, a few more like truncnorm and truncweibull_min are covered)
- relying on the usual distribution infrastructure, methods like `pdf`, `cdf` are added to the distribution object (this is important in case truncation is used, the existing infrastructure does not support the evaluation of the pdf, cdf)

For a lot of distributions, the performance of rvs is better than that of `rv_continuous` if large samples are required (for small samples, the cost of the setup step to derive `ppf_fast` is too high compared to the marginal sampling times). Below are some examples (time in ms, generation of 10**7 rvs, statistics are computed over 10 trials). For the new method, one has to take into account the setup time + the marginal sampling time:

----- beta with parameters (2.5, 3.9) -----
New method setup speed (min., mean, std): 31.3, 48.2, 17.5
New method sampling speed (min., mean, std): 469.2, 540.2, 52.1
SciPy method sampling speed (min., mean, std): 1305.7, 1467.1, 137.6

----- crystalball with parameters (0.7, 1.2) -----
New method setup speed (min., mean, std): 66.5, 88.7, 19.8
New method sampling speed (min., mean, std): 399.6, 451.8, 45.2
SciPy method sampling speed (min., mean, std): 2515.3, 2809.9, 236.1

----- gennorm with parameters (2.7,) -----
New method setup speed (min., mean, std): 12.0, 16.8, 2.0
New method sampling speed (min., mean, std): 395.4, 414.5, 17.0
SciPy method sampling speed (min., mean, std): 1798.0, 1933.1, 98.5


The functionality is not supposed to replace the existing `rv_continuous` infrastructure: the main idea is to make the UNURAN functionality more easily available and to allow truncation and faster qrvs sampling for a large number of SciPy distributions.

I also want to thank Pamphile for providing helpful advice during the development.

Best regards,

Christoph