[Numpy-discussion] Addition of new distributions: Polya-gamma

Robert Kern robert.kern at gmail.com
Mon Dec 28 13:06:33 EST 2020


My view is that we will not add more non-uniform distribution (i.e. "named"
statistical probability distributions like Polya-Gamma) methods to
`Generator`. I think that we might add a couple more methods to handle some
more fundamental issues (like sampling from the unit interval with control
over whether each boundary is open or closed, maybe one more variation on
shuffling) that helps write randomized algorithms. Now that we have the C
and Cython APIs which allow one to implement non-uniform distributions in
other packages, we strongly encourage that.

As I commented on the linked PR, `scipy.stats` would be a reasonable place
for a Polya-Gamma sampling function, even if it's not feasible to implement
an `rv_continuous` class for it. You have convinced me that the nature of
the Polya-Gamma distribution warrants this. The only issue is that scipy
still depends on a pre-`Generator` version of numpy. So I recommend
implementing this function in your own package with an eye towards
contributing it to scipy later.

On Sun, Dec 27, 2020 at 6:05 AM Zolisa Bleki <BLKZOL001 at myuct.ac.za> wrote:

> Hi All,
>
> I would like to know if Numpy accepts addition of new distributions since
> the implementation of the Generator interface. If so, what is the criteria
> for a particular distribution to be accepted? The reason why i'm asking is
> because I would like to propose adding the Polya-gamma distribution to
> numpy, for the following reasons:
>
> 1) Polya-gamma random variables are commonly used as auxiliary variables
> during data augmentation in Bayesian sampling algorithms, which have
> wide-spread usage in Statistics and recently, Machine learning.
> 2) Since this distribution is mostly useful for random sampling, it since
> appropriate to have it in numpy and not projects like scipy [1].
> 3) The only python/C++ implementation of the sampler available is licensed
> under GPLv3 which I believe limits copying into packages that choose to use
> a different license [2].
> 4) Numpy's random API makes adding the distribution painless.
>
> I have done preliminary work on this by implementing the distribution
> sampler as decribed in [3]; see:
> https://github.com/numpy/numpy/compare/master...zoj613:polyagamma .
> There is a more efficient sampling algorithm described in a later paper
> [4], but I chose not to start with that one unless I know it is worth
> investing time in.
>
> I would appreciate your thoughts on this proposal.
>
> Regards,
> Zolisa
>
>
> Refs:
> [1] https://github.com/scipy/scipy/issues/11009
> [2] https://github.com/slinderman/pypolyagamma
> [3] https://arxiv.org/pdf/1205.0310v1.pdf
> [4] https://arxiv.org/pdf/1405.0506.pdf
>
>
>
> Disclaimer - University of Cape Town This email is subject to UCT policies
> and email disclaimer published on our website at
> http://www.uct.ac.za/main/email-disclaimer or obtainable from +27 21 650
> 9111. If this email is not related to the business of UCT, it is sent by
> the sender in an individual capacity. Please report security incidents or
> abuse via https://csirt.uct.ac.za/page/report-an-incident.php.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20201228/f4bbd564/attachment.html>


More information about the NumPy-Discussion mailing list