why do the discrete distributions not have a `fit`?
![](https://secure.gravatar.com/avatar/ad13088a623822caf74e635a68a55eae.jpg?s=120&d=mm&r=g)
Why do the discrete distributions not have a `fit` method like the continuous distributions? currently it's a bug in the documentation http://projects.scipy.org/scipy/ticket/1659 in statsmodels, we fit several of the discrete distributions. How about discrete parameters? (in analogy to the erlang discussion) hypergeom is based on a story about marbles or balls http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_exa... but why should we care, it's just a discrete distribution with 3 shape parameters, isn't it? fractional marbles ?
nn = np.linspace(4.5, 8, 101) pmf = [stats.hypergeom.pmf(5, 10.8, n, 8.5) for n in nn]
plt.plot(nn, pmf, '-o') plt.title("pmf of hypergeom as function of parameter n")
Doesn't look like there are any problems, and the likelihood function is nicely concave. conclusion: scipy.stats doesn't have a hypergeometric distribution, but a generalized version that is defined on a real parameter space. Josef (so what's the point? Sorry, I was just getting distracted while looking for `fit`.)
![](https://secure.gravatar.com/avatar/3d3176cf99cae23d0ac119d1ea6c4d11.jpg?s=120&d=mm&r=g)
On Fri, May 11, 2012 at 2:04 AM, <josef.pktd@gmail.com> wrote:
Why do the discrete distributions not have a `fit` method like the continuous distributions?
currently it's a bug in the documentation http://projects.scipy.org/scipy/ticket/1659
in statsmodels, we fit several of the discrete distributions.
Which ones? And do you then return non-integer parameters or not?
How about discrete parameters? (in analogy to the erlang discussion)
hypergeom is based on a story about marbles or balls
http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_exa... but why should we care, it's just a discrete distribution with 3 shape parameters, isn't it?
fractional marbles ?
nn = np.linspace(4.5, 8, 101) pmf = [stats.hypergeom.pmf(5, 10.8, n, 8.5) for n in nn]
plt.plot(nn, pmf, '-o') plt.title("pmf of hypergeom as function of parameter n")
Doesn't look like there are any problems, and the likelihood function is nicely concave.
conclusion: scipy.stats doesn't have a hypergeometric distribution, but a generalized version that is defined on a real parameter space.
Josef (so what's the point? Sorry, I was just getting distracted while looking for `fit`.)
For functions that work with continuous input, perhaps using the continuous fit and then looking for the best-fit with integer params near the continuous optimum would work. I looked for literature on this topic, but didn't find anything useful yet. Ralf
![](https://secure.gravatar.com/avatar/ad13088a623822caf74e635a68a55eae.jpg?s=120&d=mm&r=g)
On Sun, May 20, 2012 at 1:14 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Fri, May 11, 2012 at 2:04 AM, <josef.pktd@gmail.com> wrote:
Why do the discrete distributions not have a `fit` method like the continuous distributions?
currently it's a bug in the documentation http://projects.scipy.org/scipy/ticket/1659
in statsmodels, we fit several of the discrete distributions.
Which ones? And do you then return non-integer parameters or not?
statsmodels.discrete Logit, Poisson, Negative Binomial (in GLM, but unfinished in discrete) http://statsmodels.sourceforge.net/devel/discretemod.html No integer parameters restriction since I never seen them in the literature.
How about discrete parameters? (in analogy to the erlang discussion)
hypergeom is based on a story about marbles or balls
http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_exa... but why should we care, it's just a discrete distribution with 3 shape parameters, isn't it?
fractional marbles ?
nn = np.linspace(4.5, 8, 101) pmf = [stats.hypergeom.pmf(5, 10.8, n, 8.5) for n in nn]
plt.plot(nn, pmf, '-o') plt.title("pmf of hypergeom as function of parameter n")
Doesn't look like there are any problems, and the likelihood function is nicely concave.
conclusion: scipy.stats doesn't have a hypergeometric distribution, but a generalized version that is defined on a real parameter space.
Josef (so what's the point? Sorry, I was just getting distracted while looking for `fit`.)
For functions that work with continuous input, perhaps using the continuous fit and then looking for the best-fit with integer params near the continuous optimum would work. I looked for literature on this topic, but didn't find anything useful yet.
If hypergeometric works (and I guess boltzmann and binom works from a quick look) for continuous parameters, then I don't see much reason to restrict parameters to integers. I played a bit more with hypergeom.fit and the main problem is to find starting parameters that are consistent with the data, since the support moves with the parameters, even if loc and scale are fixed. For many discrete distributions the problem might be more how to restrict the parameter space, e.g. if shape parameter is in (0,1) (Logit or Probit transformation for Bernoulli, or exp transformation for shape parameters > 0 in Poisson. Or maybe not, if the optimization routine always finds an interior solution. Josef
Ralf
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
participants (2)
-
josef.pktd@gmail.com
-
Ralf Gommers