[SciPy-User] defining new distributions with rv_continuous

Mon Apr 14 06:29:34 EDT 2014

A barebone example of subclassing:

>>> class TNC(stats.rv_continuous):
...    def _pdf(self, x):
...      return stats.norm.pdf(x) * 2.      # cf Josef's email
...
>>> tnc = TNC(name='foo', a=0.)
>>>
>>> tnc.rvs(size=3)
array([ 0.19215378,  2.43396508,  0.58672056])
>>>
>>> tnc.expect(lambda x: 1.)
0.9999999999999998

* Notice there's no need for explicit branching in _pdf, since this is
taken care of in generic pdf once you provide the `a` arg to ctor.
* While this works for small examples, you better be aware that the
generic rvs incurs a *lot* of overhead. You might want to define
explicit _ppf or _rvs.

Evgeni

On Mon, Apr 14, 2014 at 11:18 AM,  <josef.pktd at gmail.com> wrote:
> On Mon, Apr 14, 2014 at 6:13 AM, Evgeni Burovski
> <evgeny.burovskiy at gmail.com> wrote:
>> In this specific case, have a look at truncnorm:
>>
>> http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.stats.truncnorm.html
>>
>>>>> import scipy.stats as stats
>>>>> import numpy as np
>>>>> stats.truncnorm.pdf(0.1, 0, np.inf)
>> 0.79390509495402362
>>>>> stats.truncnorm.pdf(-0.1, 0, np.inf)
>> 0.0
>>
>>
>>
>> On Mon, Apr 14, 2014 at 11:00 AM, Francesco De Gasperin <fdg at voo.it> wrote:
>>> Hi,
>>>
>>> I have some problems in defining a new distribution using rv_continuous.
>>> As an example I tried to create a gaussian defined only for positive
>>> values of the random variable.
>>>
>>> This is the code:
>>>
>>> ###########################################
>>> from scipy.stats import distributions
>>> class gauss_pos(distributions.rv_continuous):
>>>     def _pdf(self, x):
>>>          if x <= 0: return 0
>>>          sigma = 1.
>>>          c = 0.
>>>          return 1/(np.sqrt(2*np.pi)*sigma) * np.exp( -(x-c)**2 /
>>> (2*sigma**2) )
>>>
>>> t_obj = gauss_pos(name="gauss_pos")
>>> print (t_obj.rvs(size=10))
>>> #############################################
>>>
>>> but when I run it I always end up with:
>>> OverflowError: (34, 'Numerical result out of range')
>>>
>>> Then, I removed the "if x <= 0: return 0" line and I tried to define the
>>> "a" and "b" parameters when creating the t_obj. Defining only the "a=0"
>>> produces the same error, while using something as:
>>>
>>> t_obj = gauss_pos(a=0,b=2)
>>>
>>> instead produces a result of this type:
>>> [ 2.          2.          1.5460072   1.09621782  1.62184086  0.42107306
>>>    2.          1.09281742  0.82957403  2.        ]
>>>
>>> which I still do not understand... it seems that "b" is returned every
>>> time the constraint that the random var must be between a and b is
>>> violated...
>
> The specific problem is that your pdf is not normalized correctly to
> integrate to 1.
> If you truncate the normal distribution, then you need to normalize
> for the truncated probability.
>
> The ppf will not be correct, and it is used in to create the rvs in a
> generic way.
>
> As Evgeni pointed out truncnorm, the best way is to look at the source
> code for some distributions and use them as pattern.
>
> There is also halfnorm.
>
> Josef
>
>
>>>
>>> any help is appreciated!
>>> cheers,
>>> Francesco
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user