[SciPy-Dev] GSoC: Integrating library UNU.RAN in SciPy

Tue Jun 8 12:52:58 EDT 2021

Hi all,

I have been working on integrating UNU.RAN in scipy.stats for a few weeks
on my fork (https://github.com/tirthasheshpatel/scipy/pull/6) with my
mentors, Christoph and Nicholas. The design is still fairly basic but is
almost ready for initial reviews. Also, it builds on Windows, macOS, and
Linux. Tests pass with an exception of unrelated failures on Linux. I am
planning to propose a PR on SciPy this week but before I do that, I thought
it would be a good idea to summarize the progress here.

The methods I propose to add and details regarding the parameters they take
can be found in this excel sheet:
https://docs.google.com/spreadsheets/d/1r36HypXwpit7sHt9YAe3K7yPL7HNNuQjTkbVYQIJ4xI/edit?usp=sharing.
Feel free to comment on the sheet if you have any suggestions.

The API looks something like this:

    import numpy as np
    from scipy.stats import TransformedDensityRejection

    pdf = lambda x: 1 - x*x
    dpdf = lambda x: -2*x

    urng = np.random.default_rng(123)
    rng = TransformedDensityRejection(pdf, dpdf, domain=(-1,1), seed=urng)

    # sample a 100,000 samples from the distribution
    rvs = rng.rvs(100_000)

    # evaluate the upper bound of the PPF at some points in the domain
    u = np.linspace(0, 1, num=1000)
    ppf_hat = rng.ppf_hat(u)

Internal API
------------

For Python callbacks, I have used the `ccallback` API and used
setjmp/longjmp for error handling. Using `ctypes` gives similar performance
but I wasn't able to figure out a way to handle errors occurring in UNU.RAN
and in the callbacks. Here is the C code I wrote to acquire and release the
callbacks:
https://github.com/tirthasheshpatel/scipy/blob/unuran_c_thunk/scipy/stats/unuran/unuran_callback.h

For more details on the internal design of the API, I have attempted to
create a few flowcharts on draw.io but they are very much a WIP:
https://drive.google.com/file/d/1TH70SSvvc5eF6-YmDO8kFNvLqKaGROW-/view?usp=sharing

Other Notes
-----------

I think NumPy 1.19 introduced a stable interface for its Generator API.
Before that the RandomState API was stable. But RandomState doesn't have a
Cython Interface and its Python interface is very slow to be practically
useful. For instance, it takes around 7ms to sample 100,000 samples from
the normal distribution using the TDR method on NumPy >= 1.19 while it
takes around 911ms on NumPy == 1.18 which is around 130 times slower! Is
there a plan to drop 1.16 soon and can we use the unstable Generator API
from 1.17 onwards or would it too unsafe? Maybe this discussion isn't
suited here but I just thought to put it out as a note.

Anyways, please feel free to comment on the design, naming convention, etc
and suggest changes/ideas that would be good to incorporate before
proposing a PR on SciPy. Thanks for reading this!

--
Kind Regards,
Tirth Patel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210608/bbc9adda/attachment.html>