scipy.stats: Sampling from an arbitrary probability distribution
![](https://secure.gravatar.com/avatar/68d20bbfa68c7d49d858bbce0618aef3.jpg?s=120&d=mm&r=g)
Hi all, I need to sample a random number from a distribution whose probability density function I specify myself. Is that possible using scipy.stats? Here is what I have already: import scipy.stats as st class my_pdf(st.rv_continuous): def _pdf(self,x): return x*x/10.0 my_cv = my_pdf(name='my_pdf') Can I now somehow sample a random number from my_cv? Thanks in advance! Daniel
![](https://secure.gravatar.com/avatar/ad13088a623822caf74e635a68a55eae.jpg?s=120&d=mm&r=g)
On Sun, Jun 3, 2012 at 7:20 AM, Daniel Sabinasz <d.sabinasz@googlemail.com> wrote:
Hi all,
I need to sample a random number from a distribution whose probability density function I specify myself. Is that possible using scipy.stats?
Here is what I have already:
import scipy.stats as st
class my_pdf(st.rv_continuous): def _pdf(self,x): return x*x/10.0
my_cv = my_pdf(name='my_pdf')
Can I now somehow sample a random number from my_cv?
you can define your own distribution with the pdf, then the generic methods will calculate the rvs http://stackoverflow.com/questions/10678546/creating-new-distributions-in-sc... It will require a large number of calls to generate random numbers, numerically calculating ppf and cdf, so this won't be efficient. Using a linear interpolated ppf will be faster, and might be accurate enough. (I still need to write an example for trying out the ppf interpolation version.) Josef
Thanks in advance!
Daniel _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
![](https://secure.gravatar.com/avatar/ad13088a623822caf74e635a68a55eae.jpg?s=120&d=mm&r=g)
On Sun, Jun 3, 2012 at 10:07 AM, <josef.pktd@gmail.com> wrote:
On Sun, Jun 3, 2012 at 7:20 AM, Daniel Sabinasz <d.sabinasz@googlemail.com> wrote:
Hi all,
I need to sample a random number from a distribution whose probability density function I specify myself. Is that possible using scipy.stats?
Here is what I have already:
import scipy.stats as st
class my_pdf(st.rv_continuous): def _pdf(self,x): return x*x/10.0
my_cv = my_pdf(name='my_pdf')
I didn't look at the example before x**2 makes a proper pdf only with a bounded support, so, when creating the instance, then the bounds .a and .b need to be given as keyword arguments. (or set as attributes) It also looks like ppf should have a closed form expression. (my algebra isn't good enough today to figure out bounds) Josef
Can I now somehow sample a random number from my_cv?
you can define your own distribution with the pdf, then the generic methods will calculate the rvs
http://stackoverflow.com/questions/10678546/creating-new-distributions-in-sc...
It will require a large number of calls to generate random numbers, numerically calculating ppf and cdf, so this won't be efficient. Using a linear interpolated ppf will be faster, and might be accurate enough.
(I still need to write an example for trying out the ppf interpolation version.)
Josef
Thanks in advance!
Daniel _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
![](https://secure.gravatar.com/avatar/ad13088a623822caf74e635a68a55eae.jpg?s=120&d=mm&r=g)
On Sun, Jun 3, 2012 at 11:21 AM, <josef.pktd@gmail.com> wrote:
On Sun, Jun 3, 2012 at 10:07 AM, <josef.pktd@gmail.com> wrote:
On Sun, Jun 3, 2012 at 7:20 AM, Daniel Sabinasz <d.sabinasz@googlemail.com> wrote:
Hi all,
I need to sample a random number from a distribution whose probability density function I specify myself. Is that possible using scipy.stats?
Here is what I have already:
import scipy.stats as st
class my_pdf(st.rv_continuous): def _pdf(self,x): return x*x/10.0
my_cv = my_pdf(name='my_pdf')
I didn't look at the example before
x**2 makes a proper pdf only with a bounded support, so, when creating the instance, then the bounds .a and .b need to be given as keyword arguments. (or set as attributes)
It also looks like ppf should have a closed form expression.
(my algebra isn't good enough today to figure out bounds)
I picked [-1, 1] for the bounds of the distribution, example with explicit ppf is much faster https://gist.github.com/2864348 Josef
Josef
Can I now somehow sample a random number from my_cv?
you can define your own distribution with the pdf, then the generic methods will calculate the rvs
http://stackoverflow.com/questions/10678546/creating-new-distributions-in-sc...
It will require a large number of calls to generate random numbers, numerically calculating ppf and cdf, so this won't be efficient. Using a linear interpolated ppf will be faster, and might be accurate enough.
(I still need to write an example for trying out the ppf interpolation version.)
Josef
Thanks in advance!
Daniel _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
![](https://secure.gravatar.com/avatar/68d20bbfa68c7d49d858bbce0618aef3.jpg?s=120&d=mm&r=g)
Thank you Josef and Nathaniel. That really helped!
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sun, Jun 3, 2012 at 12:20 PM, Daniel Sabinasz <d.sabinasz@googlemail.com> wrote:
Hi all,
I need to sample a random number from a distribution whose probability density function I specify myself. Is that possible using scipy.stats?
Here is what I have already:
import scipy.stats as st
class my_pdf(st.rv_continuous): def _pdf(self,x): return x*x/10.0
my_cv = my_pdf(name='my_pdf')
Can I now somehow sample a random number from my_cv?
The easiest and fastest way to generate random variates from a given distribution is to calculate the quantile function and then feed it random samples from the uniform distribution[1]. The catch is that computing the quantile function requires that you be able to calculate the integral of your PDF (the CDF), and then invert the CDF, so this method only applies for PDFs for which this is possible. But if your PDF is really as simple as the one in your example then this is a good approach :-). [1] https://en.wikipedia.org/wiki/Inverse_transform_sampling -- Nathaniel
![](https://secure.gravatar.com/avatar/86ea939a72cee216b3c076b52f48f338.jpg?s=120&d=mm&r=g)
On 03.06.2012 13:20, Daniel Sabinasz wrote:
Hi all,
I need to sample a random number from a distribution whose probability density function I specify myself. Is that possible using scipy.stats?
Sampling a general distribution is typically an MCMC problem, that e.g. can be solved with the Metropolis-Hastings sampler. http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm Because of its recursive nature, a Markov chain like this is better written in Cython, or you can use NumPy to run multiple chains in parallel. (I depends on how many samples you need, of course, anything below a million should be fast enough in Python.) You might also take a look at PyMCMC: https://github.com/rdenham/pymcmc Sturla
![](https://secure.gravatar.com/avatar/5bc3911d30ec66d8a65a05846502f0c4.jpg?s=120&d=mm&r=g)
Hi Maybe this article is of help http://www.google.com/url?sa=t&rct=j&q=plehn%20bruns&source=web&cd=2&ved=0CFoQFjAB&url=http%3A%2F%2Fwww.logistics-journal.de%2Fnot-reviewed%2F2005%2F7%2Fapproximation%2FApproximation_and_Computation_of_Random_Variables_using_Finite_Elements.pdf&ei=CRXNT4eyC6ik4gSoqYWPDA&usg=AFQjCNFLF-Zr3w6opNHWsM3HYGUaTqK8vA Nicky On Jun 4, 2012 3:22 PM, "Sturla Molden" <sturla@molden.no> wrote:
On 03.06.2012 13:20, Daniel Sabinasz wrote:
Hi all,
I need to sample a random number from a distribution whose probability density function I specify myself. Is that possible using scipy.stats?
Sampling a general distribution is typically an MCMC problem, that e.g. can be solved with the Metropolis-Hastings sampler.
http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm
Because of its recursive nature, a Markov chain like this is better written in Cython, or you can use NumPy to run multiple chains in parallel. (I depends on how many samples you need, of course, anything below a million should be fast enough in Python.)
You might also take a look at PyMCMC: https://github.com/rdenham/pymcmc
Sturla _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
![](https://secure.gravatar.com/avatar/6fdbe558d5fc70735afc27d41ef9abb7.jpg?s=120&d=mm&r=g)
On Jun 4, 2012, at 3:21 PM, Sturla Molden wrote:
On 03.06.2012 13:20, Daniel Sabinasz wrote:
Hi all,
I need to sample a random number from a distribution whose probability density function I specify myself. Is that possible using scipy.stats?
Sampling a general distribution is typically an MCMC problem, that e.g. can be solved with the Metropolis-Hastings sampler.
http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm
Because of its recursive nature, a Markov chain like this is better written in Cython, or you can use NumPy to run multiple chains in parallel. (I depends on how many samples you need, of course, anything below a million should be fast enough in Python.)
You might also take a look at PyMCMC: https://github.com/rdenham/pymcmc
Sturla _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
If you are willing to look outside of scipy, there are nice methods to generate random numbers from arbitrary distributions in ROOT, a C++ physics data analysis package with python bindings: http://root.cern.ch/drupal/content/pyroot import ROOT # Define the function and limits you want: # TF1::TF1(const char* name, const char* formula, Double_t xmin = 0, Double_t xmax = 1) f = ROOT.TF1("my_pdf", "x * x / 10.", -1, 1) # Generate 100 random numbers from that distribution [f.GetRandom() for _ in range(100)] You can sample from an arbitrary 2D distribution as well: # TF2::TF2(const char* name, const char* formula, Double_t xmin = 0, Double_t xmax = 1, Double_t ymin = 0, Double_t ymax = 1) f2 = ROOT.TF2("my_pdf2", "x * x / 10. + pow(y, 4)", -1, 1, 3, 4) x, y = ROOT.Double(), ROOT.Double() f2.GetRandom2(x, y) If you only want a histogram of values, not the array, you can avoid the python call overhead: # TH1D::TH1D(const char* name, const char* title, Int_t nbinsx, Double_t xlow, Double_t xup) h = ROOT.TH1D("my_hist", "my_hist", 1000, -1, 1) # void TH1::FillRandom(const char* fname, Int_t ntimes = 5000) In [49]: %timeit h.FillRandom("my_pdf", int(1e6)) 10 loops, best of 3: 171 ms per loop In [48]: %timeit [f.GetRandom() for _ in range(int(1e6))] 1 loops, best of 3: 2.62 s per loop Here you can see the method used (parabolic approximations): http://root.cern.ch/root/html/src/TF1.cxx.html#gYdi6C Even if most users don't want to install ROOT, it might be worth comparing the accuracy / speed to the method in scipy. ROOT also contains the UNURAN package, which implements several methods to sample from arbitrary one- or multi-dimensional distributions. http://root.cern.ch/root/html/MATH_UNURAN_Index.html http://statmath.wu.ac.at/unuran/ Unfortunately it's GPL and doesn't have python bindings itself as far as I know. Christoph
participants (6)
-
Christoph Deil
-
Daniel Sabinasz
-
josef.pktd@gmail.com
-
Nathaniel Smith
-
nicky van foreest
-
Sturla Molden