[SciPy-User] Question about the behavior of hypergeom.cdf

josef.pktd at gmail.com josef.pktd at gmail.com
Sun Oct 8 23:24:04 EDT 2017


On Sun, Oct 8, 2017 at 9:59 PM, Juan Nunez-Iglesias <jni.soma at gmail.com>
wrote:

> I’m not familiar with this part of the codebase, but after a bit of
> playing around, I think this isn’t yet implemented, maybe because it’s too
> hard? The relevant in scipy/stats/_distn_infrastructure.py call is:
>
> -> 2775         m = arange(int(self.a), k+1)
>
> self.a is the lower bound of the cdf to be computed, in this case, an
> array [0, 0, 0] to match the input N = self.b = [2, 4, 6]. The problem of
> course is that there is no general way to vectorize np.arange, because you
> would get a jagged array.
>
> I hope someone with more knowledge of the library will chime in, but I
> think you’ll have to implement your own looping logic in the
> short-to-medium term…
>
> Juan.
>
> On 7 Oct 2017, 8:36 PM +1100, Giovanni Bonaccorsi <
> giovanni.bonaccorsi at gmail.com>, wrote:
>
> Hi everyone,
> first post here. I have a doubt about broadcasting in scipy.
>
> I'm trying to apply broadcasting with hypergeom. The problem is that while
> hypergeom.pmf() accept my arrays without complaining the hypergeom.cdf()
> function throws out this error:
>
> TypeError: only length-1 arrays can be converted to Python scalars
>
> Now, I don't know if this is the expected behavior of the function, but
> from the documentation it doesn't seem necessary to take particular care to
> arrays argument in the hypergeom.cdf() function with respect to
> hypergeom.pmf(). Is that right or have I missed something?
>
> I'm attaching a minimal example to reproduce the error, I'm using
> Python2.7 through Anaconda with scipy and numpy updated to the last
> releases.
>
> import numpy as np
> from scipy.stats import hypergeom
>
> # this works
> x = np.array([[1,2,3],[4,5,6],[11,12,13]])
> M,n,N = [20,7,12]
> hypergeom.pmf(x,M,n,N)
> hypergeom.cdf(x,M,n,N)
>
> # this works
> M=np.array([25])
> n=np.array([3,6,9])
> N=np.array([2,4,6])
> hypergeom.pmf(x,M,n,N)
>
> # this doesn't work
> hypergeom.cdf(x,M,n,N)
>
>
I haven't seriously looked at the distribution code in a while.

The reason was and, I guess, is that the support of a distribution is not
vectorized, i.e. the bounds.a and .b.
This means that in distributions where the support depends on a shape
parameter some methods are not vectorized in that parameter.
It might work in some cases but there is no generic support for it in the
base classes. AFAIK .a and .b are always scalar

Looks like https://github.com/scipy/scipy/issues/1320 is still open.
However, I don't remember much discussion about the discrete distributions
like hypergeom.

Josef



>
> Thanks a lot for the support.,
> Giovanni
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at python.org
> https://mail.python.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at python.org
> https://mail.python.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-user/attachments/20171008/aa03d2eb/attachment.html>


More information about the SciPy-User mailing list