[SciPy-User] log pdf, cdf, etc
josef.pktd at gmail.com
josef.pktd at gmail.com
Fri May 28 22:53:37 EDT 2010
On Fri, May 28, 2010 at 9:03 PM, Chris Strickland
<christophermarkstrickland at gmail.com> wrote:
>
>
> On Sat, May 29, 2010 at 12:15 AM, <josef.pktd at gmail.com> wrote:
>>
>> It would need a new method for each distribution, e.g. _loglike, _logpdf
>> So, this is work, and for some distributions the log wouldn't simplify
>> much.
>>
> I am not sure what you mean the log wouldn't simply much.
>
>>
>> I proposed this once together with other improvements (but without
>> response).
>>
> This is a little disappointing, it significantly reduces how useful the
> library is. In actual fact I have not been able to use a single function for
> anything other than testing (although, I have been using numpy.random for
> random numbers, this scipy.stats collection seems far more complete). This
> would dramatically change if a log version of the distribution were
> available. I think in most cases this would be a straightforward addition at
> least for the pdf.
I don't think for many use cases log(stats.t.pdf) or many other
distributions the performance and accuracy hit would be large enough
to make it useless. At least, I haven't seen any other comments in
this direction.
On of the main use cases for me of stats.distributions are all the
statistical test distributions, t, F, chi2 and so on. Howver, in
statsmodels we have a mixture of calls to the pdf/cdf of
stats.distributions and reimplementations of loglikelhood functions,
where the scipy version is also just used for testing.
>
>
>>
>> The second useful method for estimation would be _fitstart, which
>> provides distribution specific starting values for fit, e.g. a moment
>> estimator, or a simple rules of thumb
>> http://projects.scipy.org/scipy/ticket/808
>>
>>
>> Here are some of my currently planned enhancements to the distributions:
>>
>>
>> http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py
>>
>> but I just checked, it looks like I forgot to copy the _loglike method
>> that I started from my experimental scripts.
>>
>> For a few distributions, where this is possible, it would also be
>> useful to add the gradient with respect to the parameters, (or even
>> the Hessian). But this is currently mostly just an idea, since we need
>> some analytical gradients in the estimation of stats models.
>>
> This certainly would be nice as well.
>>
>> >
>> > If there is not is it possible for me to suggest that this feature is
>> > added.
>> > There is such an excellent range of distributions, each with such an
>> > impressive range of options, it seems ashame to have to mostly manually
>> > code
>> > up the log of pdfs and often call the log of CDFs from R.
>>
>> So far I only thought about log pdf, because I wanted it for Maximum
>> Likelihood estimation.
>>
> It is also necessary for MCMC.
pymc has many distributions with loglike in fortran for speed, but for
most distributions only loglike and rvs are defined, if I remember
correctly.
>
>>
>> Do you have a rough idea for which distributions log cdf would work?
>> that is, for which distribution is an analytical or efficient
>> numerical expression possible.
>
> Not sure off the top of my head as I mainly require the only the pdf. I was,
> however, doing a little survival analysis the other day though and it was
> required. The log of the survival and hazard functions would be nice also.
> So far I have only required the exponential (analytical), weibull
> (analytical), normal (numerical) and powernormal (analytical function of the
> log of the normal cdf). I just had a peak at the R source code for pnorm
> (R's code for the normal cdf). The function is not big and also licensed
> under the GNU public licence. I assume it could be fairly easily ported to
> scipy.
R's license, GPL, is incompatible with the license of scipy, BSD.
While they are allowed to look at our code, code that goes into scipy
cannot be based on GPL licensed code.
If never seen it mentioned before that there is a direct function for
log(norm.cdf). Which functions and packages in R implement the
logarithm of the cdf of these distributions?
The cdf for several distributions (including normal) is implement in
Fortran or C in scipy.special, and I've never seen a log version for
them.
>>
>> I also think that scipy.stats.distributions could be one of the best
>> (broadest, consistent) collection of univariate distributions that I
>> have seen so far, once we fill in some missing pieces.
>>
>> As a way forward, I think we could make the distributions into a
>> numerical encyclopedia by adding private methods to those
>> distributions where it makes sense, like log pdf, log cdf and I also
>> started to add characteristic functions to some distributions in my
>> experimental scripts.
>> If you have a collection of logpdf, logcdf, we could add a trac ticket for
>> this.
>
> I could fairly easy whip up a collection of functions to compute the logpdf
> for a large number of distributions. Not sure about the CDFs but I can look
> into it as well. The pdf's are definitely far more urgent for my own work. I
> am a bit busy at work though for the next three weeks so it would have to be
> after that.
I looked at some of the distributions, and logpdf could be more
efficiently calculated in many of them and very often also logcdf
I opened a ticket for this
http://projects.scipy.org/scipy/ticket/1184
I also saw that there are still smaller, numerical improvements
possible in several distributions.
Thanks,
Josef
>>
>> However, this would miss the generic broadcasting part of the public
>> functions, pdf, cdf,... but for estimation I wouldn't necessarily call
>> those because of the overhead.
>>
>>
>> I'm working on and off on this, so it's moving only slowly (and my
>> wishlist is big).
>> (for example, I was reading up on extreme value distributions in
>> actuarial science and hydrology to get a better overview over the
>> estimators.)
>>
>>
>> So, I really love to hear any ideas, feedback, and see contributions
>> to improving the distributions.
>>
>> Josef
>>
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
More information about the SciPy-User
mailing list