Mailman 3 scipy.stats: Sampling from an arbitrary probability distribution - SciPy-User

scipy.stats: Sampling from an arbitrary probability distribution

older
why does scipy.stats.t.ppf return...

Daniel Sabinasz

3 Jun 2012 3 Jun '12

2:20 p.m.

Hi all, I need to sample a random number from a distribution whose probability density function I specify myself. Is that possible using scipy.stats? Here is what I have already: import scipy.stats as st class my_pdf(st.rv_continuous): def _pdf(self,x): return x*x/10.0 my_cv = my_pdf(name='my_pdf') Can I now somehow sample a random number from my_cv? Thanks in advance! Daniel

Show replies by date

josef.pktd＠gmail.com

3 Jun 3 Jun

5:07 p.m.

New subject: scipy.stats: Sampling from an arbitrary probability distribution

On Sun, Jun 3, 2012 at 7:20 AM, Daniel Sabinasz <d.sabinasz@googlemail.com> wrote:

...

Hi all,

I need to sample a random number from a distribution whose probability density function I specify myself. Is that possible using scipy.stats?

Here is what I have already:

import scipy.stats as st

class my_pdf(st.rv_continuous): def _pdf(self,x): return x*x/10.0

my_cv = my_pdf(name='my_pdf')

Can I now somehow sample a random number from my_cv?

you can define your own distribution with the pdf, then the generic methods will calculate the rvs http://stackoverflow.com/questions/10678546/creating-new-distributions-in-sc... It will require a large number of calls to generate random numbers, numerically calculating ppf and cdf, so this won't be efficient. Using a linear interpolated ppf will be faster, and might be accurate enough. (I still need to write an example for trying out the ppf interpolation version.) Josef

...

Thanks in advance!

Daniel _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user

josef.pktd＠gmail.com

6:21 p.m.

New subject: scipy.stats: Sampling from an arbitrary probability distribution

On Sun, Jun 3, 2012 at 10:07 AM, <josef.pktd@gmail.com> wrote:

...

On Sun, Jun 3, 2012 at 7:20 AM, Daniel Sabinasz <d.sabinasz@googlemail.com> wrote:

...
Hi all,

I need to sample a random number from a distribution whose probability density function I specify myself. Is that possible using scipy.stats?

Here is what I have already:

import scipy.stats as st

class my_pdf(st.rv_continuous): def _pdf(self,x): return x*x/10.0

my_cv = my_pdf(name='my_pdf')

I didn't look at the example before x**2 makes a proper pdf only with a bounded support, so, when creating the instance, then the bounds .a and .b need to be given as keyword arguments. (or set as attributes) It also looks like ppf should have a closed form expression. (my algebra isn't good enough today to figure out bounds) Josef

...

...
Can I now somehow sample a random number from my_cv?

you can define your own distribution with the pdf, then the generic methods will calculate the rvs

http://stackoverflow.com/questions/10678546/creating-new-distributions-in-sc...

It will require a large number of calls to generate random numbers, numerically calculating ppf and cdf, so this won't be efficient. Using a linear interpolated ppf will be faster, and might be accurate enough.

(I still need to write an example for trying out the ppf interpolation version.)

Josef

...
Thanks in advance!

Daniel _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user

josef.pktd＠gmail.com

8:51 p.m.

New subject: scipy.stats: Sampling from an arbitrary probability distribution

On Sun, Jun 3, 2012 at 11:21 AM, <josef.pktd@gmail.com> wrote:

...

On Sun, Jun 3, 2012 at 10:07 AM, <josef.pktd@gmail.com> wrote:

...
On Sun, Jun 3, 2012 at 7:20 AM, Daniel Sabinasz <d.sabinasz@googlemail.com> wrote:

...
Hi all,

I need to sample a random number from a distribution whose probability density function I specify myself. Is that possible using scipy.stats?

Here is what I have already:

import scipy.stats as st

class my_pdf(st.rv_continuous): def _pdf(self,x): return x*x/10.0

my_cv = my_pdf(name='my_pdf')

I didn't look at the example before

x**2 makes a proper pdf only with a bounded support, so, when creating the instance, then the bounds .a and .b need to be given as keyword arguments. (or set as attributes)

It also looks like ppf should have a closed form expression.

(my algebra isn't good enough today to figure out bounds)

I picked [-1, 1] for the bounds of the distribution, example with explicit ppf is much faster https://gist.github.com/2864348 Josef

...

Josef

...
...
Can I now somehow sample a random number from my_cv?

you can define your own distribution with the pdf, then the generic methods will calculate the rvs

http://stackoverflow.com/questions/10678546/creating-new-distributions-in-sc...

It will require a large number of calls to generate random numbers, numerically calculating ppf and cdf, so this won't be efficient. Using a linear interpolated ppf will be faster, and might be accurate enough.

(I still need to write an example for trying out the ppf interpolation version.)

Josef

...
Thanks in advance!

Daniel _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user

Daniel Sabinasz

9:43 p.m.

New subject: scipy.stats: Sampling from an arbitrary probability distribution

Thank you Josef and Nathaniel. That really helped!

Nathaniel Smith

5:18 p.m.

New subject: scipy.stats: Sampling from an arbitrary probability distribution

On Sun, Jun 3, 2012 at 12:20 PM, Daniel Sabinasz <d.sabinasz@googlemail.com> wrote:

...

Hi all,

I need to sample a random number from a distribution whose probability density function I specify myself. Is that possible using scipy.stats?

Here is what I have already:

import scipy.stats as st

class my_pdf(st.rv_continuous): def _pdf(self,x): return x*x/10.0

my_cv = my_pdf(name='my_pdf')

Can I now somehow sample a random number from my_cv?

The easiest and fastest way to generate random variates from a given distribution is to calculate the quantile function and then feed it random samples from the uniform distribution[1]. The catch is that computing the quantile function requires that you be able to calculate the integral of your PDF (the CDF), and then invert the CDF, so this method only applies for PDFs for which this is possible. But if your PDF is really as simple as the one in your example then this is a good approach :-). [1] https://en.wikipedia.org/wiki/Inverse_transform_sampling -- Nathaniel

Sturla Molden

4 Jun 4 Jun

4:21 p.m.

New subject: scipy.stats: Sampling from an arbitrary probability distribution

On 03.06.2012 13:20, Daniel Sabinasz wrote:

...

Hi all,

I need to sample a random number from a distribution whose probability density function I specify myself. Is that possible using scipy.stats?

Sampling a general distribution is typically an MCMC problem, that e.g. can be solved with the Metropolis-Hastings sampler. http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm Because of its recursive nature, a Markov chain like this is better written in Cython, or you can use NumPy to run multiple chains in parallel. (I depends on how many samples you need, of course, anything below a million should be fast enough in Python.) You might also take a look at PyMCMC: https://github.com/rdenham/pymcmc Sturla

nicky van foreest

11:07 p.m.

New subject: scipy.stats: Sampling from an arbitrary probability distribution

Hi Maybe this article is of help http://www.google.com/url?sa=t&rct=j&q=plehn%20bruns&source=web&cd=2&ved=0CFoQFjAB&url=http%3A%2F%2Fwww.logistics-journal.de%2Fnot-reviewed%2F2005%2F7%2Fapproximation%2FApproximation_and_Computation_of_Random_Variables_using_Finite_Elements.pdf&ei=CRXNT4eyC6ik4gSoqYWPDA&usg=AFQjCNFLF-Zr3w6opNHWsM3HYGUaTqK8vA Nicky On Jun 4, 2012 3:22 PM, "Sturla Molden" <sturla@molden.no> wrote:

...

On 03.06.2012 13:20, Daniel Sabinasz wrote:

...
Hi all,

I need to sample a random number from a distribution whose probability density function I specify myself. Is that possible using scipy.stats?

Sampling a general distribution is typically an MCMC problem, that e.g. can be solved with the Metropolis-Hastings sampler.

http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm

Because of its recursive nature, a Markov chain like this is better written in Cython, or you can use NumPy to run multiple chains in parallel. (I depends on how many samples you need, of course, anything below a million should be fast enough in Python.)

You might also take a look at PyMCMC: https://github.com/rdenham/pymcmc

Sturla _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user

Christoph Deil

5 Jun 5 Jun

12:05 p.m.

New subject: scipy.stats: Sampling from an arbitrary probability distribution

On Jun 4, 2012, at 3:21 PM, Sturla Molden wrote:

...

On 03.06.2012 13:20, Daniel Sabinasz wrote:

...
Hi all,

I need to sample a random number from a distribution whose probability density function I specify myself. Is that possible using scipy.stats?

Sampling a general distribution is typically an MCMC problem, that e.g. can be solved with the Metropolis-Hastings sampler.

http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm

Because of its recursive nature, a Markov chain like this is better written in Cython, or you can use NumPy to run multiple chains in parallel. (I depends on how many samples you need, of course, anything below a million should be fast enough in Python.)

You might also take a look at PyMCMC: https://github.com/rdenham/pymcmc

Sturla _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user

If you are willing to look outside of scipy, there are nice methods to generate random numbers from arbitrary distributions in ROOT, a C++ physics data analysis package with python bindings: http://root.cern.ch/drupal/content/pyroot import ROOT # Define the function and limits you want: # TF1::TF1(const char* name, const char* formula, Double_t xmin = 0, Double_t xmax = 1) f = ROOT.TF1("my_pdf", "x * x / 10.", -1, 1) # Generate 100 random numbers from that distribution [f.GetRandom() for _ in range(100)] You can sample from an arbitrary 2D distribution as well: # TF2::TF2(const char* name, const char* formula, Double_t xmin = 0, Double_t xmax = 1, Double_t ymin = 0, Double_t ymax = 1) f2 = ROOT.TF2("my_pdf2", "x * x / 10. + pow(y, 4)", -1, 1, 3, 4) x, y = ROOT.Double(), ROOT.Double() f2.GetRandom2(x, y) If you only want a histogram of values, not the array, you can avoid the python call overhead: # TH1D::TH1D(const char* name, const char* title, Int_t nbinsx, Double_t xlow, Double_t xup) h = ROOT.TH1D("my_hist", "my_hist", 1000, -1, 1) # void TH1::FillRandom(const char* fname, Int_t ntimes = 5000) In [49]: %timeit h.FillRandom("my_pdf", int(1e6)) 10 loops, best of 3: 171 ms per loop In [48]: %timeit [f.GetRandom() for _ in range(int(1e6))] 1 loops, best of 3: 2.62 s per loop Here you can see the method used (parabolic approximations): http://root.cern.ch/root/html/src/TF1.cxx.html#gYdi6C Even if most users don't want to install ROOT, it might be worth comparing the accuracy / speed to the method in scipy. ROOT also contains the UNURAN package, which implements several methods to sample from arbitrary one- or multi-dimensional distributions. http://root.cern.ch/root/html/MATH_UNURAN_Index.html http://statmath.wu.ac.at/unuran/ Unfortunately it's GPL and doesn't have python bindings itself as far as I know. Christoph

4413

Age (days ago)

4415

Last active (days ago)

List overview

Download

8 comments

6 participants

participants (6)

Christoph Deil
Daniel Sabinasz
josef.pktd＠gmail.com
Nathaniel Smith
nicky van foreest
Sturla Molden

scipy.stats: Sampling from an arbitrary probability distribution

Daniel Sabinasz

josef.pktd＠gmail.com

josef.pktd＠gmail.com

josef.pktd＠gmail.com

Daniel Sabinasz

Nathaniel Smith

Sturla Molden

nicky van foreest

Christoph Deil

tags

participants (6)