[Numpy-discussion] bug in stats.randint

Thu Apr 23 11:18:18 EDT 2009

On Thu, Apr 23, 2009 at 2:56 PM, <josef.pktd at gmail.com> wrote:

> On Thu, Apr 23, 2009 at 9:27 AM, Flavio Coelho <fccoelho at gmail.com> wrote:
> >
> > Hi,
> >
> > I stumbled upon something I think is a bug in scipy:
> >
> > In [4]: stats.randint(1.,15.).ppf([.1,
> > .2,.3,.4,.5])
> > Out[4]: array([ 2.,  3.,  5.,  6.,  7.])
> >
> > When you pass float arguments to stats.randint and then call the ppf
> method,
> > you get an array of floats, which clearly wrong. The rvs method doesn't
> > display this bug so I think is a matter of poor type checking in the ppf
> > implementation...
> >
>
> I switched to using floats intentionally, to have correct handling of
> inf and nans. and the argument checking is generic for all discrete
> distributions and not special cased. Nans are converted to zero when
> casting to integers, which is wrong and very confusing. inf raise an
> exception. I prefer correct numbers to correct types. see examples
> below

I understand. I couldn't find, however, any parameterizations which makes
randint return INF or NAN. So it should be safe to make randint return
integers.

I tested the equivalent function (for Poisson) in R (qpois) and it also
returns INF when quantile is 1. so no problem there.

I found some weird behaviors with stats.poisson.ppf:

In [34]: stats.poisson.ppf([0,1])
Out[34]: array([ -1.,  Inf]) #R returns [0,inf]
In [35]: stats.poisson.ppf([0.,.1])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/fccoelho/<ipython console> in <module>()

/usr/lib/python2.6/dist-packages/scipy/stats/distributions.pyc in ppf(self,
q, *args, **kwds)
   4002             goodargs = argsreduce(cond, *((q,)+args+(loc,)))
   4003             loc, goodargs = goodargs[-1], goodargs[:-1]
-> 4004             place(output,cond,self._ppf(*goodargs) + loc)
   4005
   4006         if output.ndim == 0:

TypeError: _ppf() takes exactly 3 arguments (2 given)

I hope these bug reports help.
I think handling infinity and zero is almost problematic, maybe the best we
can do is to aim for predictable behavior (possibly matching the behavior of
equivalent R functions)

I think that if randint would return ints no matter what the type of the
parameters, it would make the behavior of my code a lot more predictable.

>
>
> If you don't have inf and nans you can cast them to int yourself.
>
> Josef
>
> >>> aint = np.zeros(5,dtype=int)
> >>> aint[0]= np.nan
> >>> aint
> array([0, 0, 0, 0, 0])
> >>> aint[1]= np.inf
> Traceback (most recent call last):
>  File "<pyshell#134>", line 1, in <module>
> OverflowError: cannot convert float infinity to long
>
> >>> from scipy import stats
> >>> stats.poisson.ppf(1,1)
> inf
> >>> stats.poisson.ppf(2,1)
> nan
>
> >>> stats.poisson.ppf(1,1).astype(int)
> -2147483648
> >>> aint[2] = stats.poisson.ppf(1,1)
> Traceback (most recent call last):
>  File "<pyshell#140>", line 1, in <module>
> OverflowError: cannot convert float infinity to long
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

-- 
---
Flávio Codeço Coelho
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20090423/1084d9a9/attachment.html>