Hi all, I have been working on integrating UNU.RAN in scipy.stats for a few weeks on my fork (https://github.com/tirthasheshpatel/scipy/pull/6) with my mentors, Christoph and Nicholas. The design is still fairly basic but is almost ready for initial reviews. Also, it builds on Windows, macOS, and Linux. Tests pass with an exception of unrelated failures on Linux. I am planning to propose a PR on SciPy this week but before I do that, I thought it would be a good idea to summarize the progress here. The methods I propose to add and details regarding the parameters they take can be found in this excel sheet: https://docs.google.com/spreadsheets/d/1r36HypXwpit7sHt9YAe3K7yPL7HNNuQjTkbV.... Feel free to comment on the sheet if you have any suggestions. The API looks something like this: import numpy as np from scipy.stats import TransformedDensityRejection pdf = lambda x: 1 - x*x dpdf = lambda x: -2*x urng = np.random.default_rng(123) rng = TransformedDensityRejection(pdf, dpdf, domain=(-1,1), seed=urng) # sample a 100,000 samples from the distribution rvs = rng.rvs(100_000) # evaluate the upper bound of the PPF at some points in the domain u = np.linspace(0, 1, num=1000) ppf_hat = rng.ppf_hat(u) Internal API ------------ For Python callbacks, I have used the `ccallback` API and used setjmp/longjmp for error handling. Using `ctypes` gives similar performance but I wasn't able to figure out a way to handle errors occurring in UNU.RAN and in the callbacks. Here is the C code I wrote to acquire and release the callbacks: https://github.com/tirthasheshpatel/scipy/blob/unuran_c_thunk/scipy/stats/un... For more details on the internal design of the API, I have attempted to create a few flowcharts on draw.io but they are very much a WIP: https://drive.google.com/file/d/1TH70SSvvc5eF6-YmDO8kFNvLqKaGROW-/view?usp=s... Other Notes ----------- I think NumPy 1.19 introduced a stable interface for its Generator API. Before that the RandomState API was stable. But RandomState doesn't have a Cython Interface and its Python interface is very slow to be practically useful. For instance, it takes around 7ms to sample 100,000 samples from the normal distribution using the TDR method on NumPy >= 1.19 while it takes around 911ms on NumPy == 1.18 which is around 130 times slower! Is there a plan to drop 1.16 soon and can we use the unstable Generator API from 1.17 onwards or would it too unsafe? Maybe this discussion isn't suited here but I just thought to put it out as a note. Anyways, please feel free to comment on the design, naming convention, etc and suggest changes/ideas that would be good to incorporate before proposing a PR on SciPy. Thanks for reading this! -- Kind Regards, Tirth Patel
On Tue, Jun 8, 2021 at 6:53 PM Tirth Patel <tirthasheshpatel@gmail.com> wrote:
Hi all,
I have been working on integrating UNU.RAN in scipy.stats for a few weeks on my fork (https://github.com/tirthasheshpatel/scipy/pull/6) with my mentors, Christoph and Nicholas. The design is still fairly basic but is almost ready for initial reviews. Also, it builds on Windows, macOS, and Linux. Tests pass with an exception of unrelated failures on Linux. I am planning to propose a PR on SciPy this week but before I do that, I thought it would be a good idea to summarize the progress here.
Thanks for the nice overview Tirth!
The methods I propose to add and details regarding the parameters they take can be found in this excel sheet: https://docs.google.com/spreadsheets/d/1r36HypXwpit7sHt9YAe3K7yPL7HNNuQjTkbV.... Feel free to comment on the sheet if you have any suggestions.
The API looks something like this:
import numpy as np from scipy.stats import TransformedDensityRejection
pdf = lambda x: 1 - x*x dpdf = lambda x: -2*x
urng = np.random.default_rng(123) rng = TransformedDensityRejection(pdf, dpdf, domain=(-1,1), seed=urng)
# sample a 100,000 samples from the distribution rvs = rng.rvs(100_000)
# evaluate the upper bound of the PPF at some points in the domain u = np.linspace(0, 1, num=1000) ppf_hat = rng.ppf_hat(u)
Internal API ------------
For Python callbacks, I have used the `ccallback` API and used setjmp/longjmp for error handling. Using `ctypes` gives similar performance but I wasn't able to figure out a way to handle errors occurring in UNU.RAN and in the callbacks. Here is the C code I wrote to acquire and release the callbacks: https://github.com/tirthasheshpatel/scipy/blob/unuran_c_thunk/scipy/stats/un...
For more details on the internal design of the API, I have attempted to create a few flowcharts on draw.io but they are very much a WIP: https://drive.google.com/file/d/1TH70SSvvc5eF6-YmDO8kFNvLqKaGROW-/view?usp=s...
Other Notes -----------
I think NumPy 1.19 introduced a stable interface for its Generator API. Before that the RandomState API was stable. But RandomState doesn't have a Cython Interface and its Python interface is very slow to be practically useful. For instance, it takes around 7ms to sample 100,000 samples from the normal distribution using the TDR method on NumPy >= 1.19 while it takes around 911ms on NumPy == 1.18 which is around 130 times slower! Is there a plan to drop 1.16 soon and can we use the unstable Generator API from 1.17 onwards or would it too unsafe? Maybe this discussion isn't suited here but I just thought to put it out as a note.
We can drop NumPy 1.16 right now. I'm not sure if the 1.17 C API for numpy.random is usable - it was either missing some features or not present at all. What the recently added biasedurn does is a conditional compile - only use that C API for NumPy >= 1.19. If the performance on <1.19 isn't completely unusable, that may be a good option? Cheers, Ralf
Anyways, please feel free to comment on the design, naming convention, etc and suggest changes/ideas that would be good to incorporate before proposing a PR on SciPy. Thanks for reading this!
-- Kind Regards, Tirth Patel _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
Thanks Tirth for the heads up!
On 08.06.2021, at 19:43, Ralf Gommers <ralf.gommers@gmail.com> wrote:
We can drop NumPy 1.16 right now.
Glad to read! I will be happy to work on this as soon as I am back! There is quite some cleaning we can do thanks to that just on the generator side. Cheers, Pamphile
Hi Ralf, On Tue, Jun 8, 2021 at 11:14 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Tue, Jun 8, 2021 at 6:53 PM Tirth Patel <tirthasheshpatel@gmail.com> wrote:
Hi all,
I have been working on integrating UNU.RAN in scipy.stats for a few weeks on my fork (https://github.com/tirthasheshpatel/scipy/pull/6) with my mentors, Christoph and Nicholas. The design is still fairly basic but is almost ready for initial reviews. Also, it builds on Windows, macOS, and Linux. Tests pass with an exception of unrelated failures on Linux. I am planning to propose a PR on SciPy this week but before I do that, I thought it would be a good idea to summarize the progress here.
Thanks for the nice overview Tirth!
The methods I propose to add and details regarding the parameters they take can be found in this excel sheet: https://docs.google.com/spreadsheets/d/1r36HypXwpit7sHt9YAe3K7yPL7HNNuQjTkbV.... Feel free to comment on the sheet if you have any suggestions.
The API looks something like this:
import numpy as np from scipy.stats import TransformedDensityRejection
pdf = lambda x: 1 - x*x dpdf = lambda x: -2*x
urng = np.random.default_rng(123) rng = TransformedDensityRejection(pdf, dpdf, domain=(-1,1), seed=urng)
# sample a 100,000 samples from the distribution rvs = rng.rvs(100_000)
# evaluate the upper bound of the PPF at some points in the domain u = np.linspace(0, 1, num=1000) ppf_hat = rng.ppf_hat(u)
Internal API ------------
For Python callbacks, I have used the `ccallback` API and used setjmp/longjmp for error handling. Using `ctypes` gives similar performance but I wasn't able to figure out a way to handle errors occurring in UNU.RAN and in the callbacks. Here is the C code I wrote to acquire and release the callbacks: https://github.com/tirthasheshpatel/scipy/blob/unuran_c_thunk/scipy/stats/un...
For more details on the internal design of the API, I have attempted to create a few flowcharts on draw.io but they are very much a WIP: https://drive.google.com/file/d/1TH70SSvvc5eF6-YmDO8kFNvLqKaGROW-/view?usp=s...
Other Notes -----------
I think NumPy 1.19 introduced a stable interface for its Generator API. Before that the RandomState API was stable. But RandomState doesn't have a Cython Interface and its Python interface is very slow to be practically useful. For instance, it takes around 7ms to sample 100,000 samples from the normal distribution using the TDR method on NumPy >= 1.19 while it takes around 911ms on NumPy == 1.18 which is around 130 times slower! Is there a plan to drop 1.16 soon and can we use the unstable Generator API from 1.17 onwards or would it too unsafe? Maybe this discussion isn't suited here but I just thought to put it out as a note.
We can drop NumPy 1.16 right now. I'm not sure if the 1.17 C API for numpy.random is usable - it was either missing some features or not present at all.
Nice to hear that we don't need to support 1.16 now! With that, I think there is a possibility of using the Cython API of NumPy BitGenerator to speed things up. I checked out a few releases on NumPy and found out BitGenerator was added in 1.17 with a Cython API. All we need is the `next_double` and `state` member of the `bitgen_t` object which are present from 1.17 onwards. The only difference is that 1.17 contains the `bitgen_t` in `numpy.random.common` module while it was shifted to `numpy.random` from 1.18 onwards. I don't know if there are any known bugs in 1.17 and 1.18 before becoming stable in 1.19. If not, we might be able to use the unstable NumPy Generator in 1.17 and 1.18 for our purpose. What do you think? What the recently added biasedurn does is a conditional compile - only use
that C API for NumPy >= 1.19. If the performance on <1.19 isn't completely unusable, that may be a good option?
Yes, that's what I do right now. But I am just worried if the performance on <1.19 is too slow to practically rely on. Anyways, thanks for looking into this! Cheers,
Ralf
Anyways, please feel free to comment on the design, naming convention, etc and suggest changes/ideas that would be good to incorporate before proposing a PR on SciPy. Thanks for reading this!
-- Kind Regards, Tirth Patel _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
On Wed, Jun 9, 2021 at 12:16 PM Tirth Patel <tirthasheshpatel@gmail.com> wrote:
Hi Ralf,
On Tue, Jun 8, 2021 at 11:14 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Tue, Jun 8, 2021 at 6:53 PM Tirth Patel <tirthasheshpatel@gmail.com> wrote:
Other Notes -----------
I think NumPy 1.19 introduced a stable interface for its Generator API. Before that the RandomState API was stable. But RandomState doesn't have a Cython Interface and its Python interface is very slow to be practically useful. For instance, it takes around 7ms to sample 100,000 samples from the normal distribution using the TDR method on NumPy >= 1.19 while it takes around 911ms on NumPy == 1.18 which is around 130 times slower! Is there a plan to drop 1.16 soon and can we use the unstable Generator API from 1.17 onwards or would it too unsafe? Maybe this discussion isn't suited here but I just thought to put it out as a note.
We can drop NumPy 1.16 right now. I'm not sure if the 1.17 C API for numpy.random is usable - it was either missing some features or not present at all.
Nice to hear that we don't need to support 1.16 now! With that, I think there is a possibility of using the Cython API of NumPy BitGenerator to speed things up. I checked out a few releases on NumPy and found out BitGenerator was added in 1.17 with a Cython API. All we need is the `next_double` and `state` member of the `bitgen_t` object which are present from 1.17 onwards. The only difference is that 1.17 contains the `bitgen_t` in `numpy.random.common` module while it was shifted to `numpy.random` from 1.18 onwards. I don't know if there are any known bugs in 1.17 and 1.18 before becoming stable in 1.19. If not, we might be able to use the unstable NumPy Generator in 1.17 and 1.18 for our purpose. What do you think?
I think this is fine - if it helps that much, let's just try it. If it passes all tests with the latest bugfix releases of 1.17.x and 1.18.x then it should be okay. Cheers, Ralf
What the recently added biasedurn does is a conditional compile - only use
that C API for NumPy >= 1.19. If the performance on <1.19 isn't completely unusable, that may be a good option?
Yes, that's what I do right now. But I am just worried if the performance on <1.19 is too slow to practically rely on. Anyways, thanks for looking into this!
On Sat, Jun 12, 2021 at 4:45 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Wed, Jun 9, 2021 at 12:16 PM Tirth Patel <tirthasheshpatel@gmail.com> wrote:
Hi Ralf,
On Tue, Jun 8, 2021 at 11:14 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Tue, Jun 8, 2021 at 6:53 PM Tirth Patel <tirthasheshpatel@gmail.com> wrote:
Other Notes -----------
I think NumPy 1.19 introduced a stable interface for its Generator API. Before that the RandomState API was stable. But RandomState doesn't have a Cython Interface and its Python interface is very slow to be practically useful. For instance, it takes around 7ms to sample 100,000 samples from the normal distribution using the TDR method on NumPy >= 1.19 while it takes around 911ms on NumPy == 1.18 which is around 130 times slower! Is there a plan to drop 1.16 soon and can we use the unstable Generator API from 1.17 onwards or would it too unsafe? Maybe this discussion isn't suited here but I just thought to put it out as a note.
We can drop NumPy 1.16 right now. I'm not sure if the 1.17 C API for numpy.random is usable - it was either missing some features or not present at all.
Nice to hear that we don't need to support 1.16 now! With that, I think there is a possibility of using the Cython API of NumPy BitGenerator to speed things up. I checked out a few releases on NumPy and found out BitGenerator was added in 1.17 with a Cython API. All we need is the `next_double` and `state` member of the `bitgen_t` object which are present from 1.17 onwards. The only difference is that 1.17 contains the `bitgen_t` in `numpy.random.common` module while it was shifted to `numpy.random` from 1.18 onwards. I don't know if there are any known bugs in 1.17 and 1.18 before becoming stable in 1.19. If not, we might be able to use the unstable NumPy Generator in 1.17 and 1.18 for our purpose. What do you think?
I think this is fine - if it helps that much, let's just try it. If it passes all tests with the latest bugfix releases of 1.17.x and 1.18.x then it should be okay.
While refactoring UNU.RAN wrapper to address recent memory leaks issue, I tried to do this but turns out 1.17 doesn't ship the C/Cython API. So, I have kept things as-is for now. Sorry for the noise here!
Cheers, Ralf
What the recently added biasedurn does is a conditional compile - only
use that C API for NumPy >= 1.19. If the performance on <1.19 isn't completely unusable, that may be a good option?
Yes, that's what I do right now. But I am just worried if the performance on <1.19 is too slow to practically rely on. Anyways, thanks for looking into this!
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
participants (3)
-
Pamphile Roy -
Ralf Gommers -
Tirth Patel