[SciPy-Dev] Deprecate planck distribution?
Christoph Baumgarten
christoph.baumgarten at gmail.com
Sat Jan 5 02:08:58 EST 2019
My main concern about planck is that I am not aware that this is a known
distribution name. I found Planck's law (
https://en.wikipedia.org/wiki/Planck%27s_law) but I don't recognize the
distribution implemented in SciPy. Does anyone know the distribution under
that name?
It is also called discrete exponential in scipy: normally, the geometric
distribution is called the discrete analogue of the exponential (no memory
property), so this could be confusing for users.
The implementation of geom in SciPy is based on geometric in NumPy, my
guess is that it has a better sampling method than the one of planck based
on the ppf.
We can also leave the different parametrization in stats and explain it in
the docstring.
Christoph
On Thu, Jan 3, 2019 at 10:30 PM <scipy-dev-request at python.org> wrote:
> Send SciPy-Dev mailing list submissions to
> scipy-dev at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://mail.python.org/mailman/listinfo/scipy-dev
> or, via email, send a message with subject or body 'help' to
> scipy-dev-request at python.org
>
> You can reach the person managing the list at
> scipy-dev-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of SciPy-Dev digest..."
>
>
> Today's Topics:
>
> 1. Re: add johnson SL distribution (josef.pktd at gmail.com)
> 2. Re: Deprecate planck distribution? (josef.pktd at gmail.com)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 3 Jan 2019 15:57:26 -0500
> From: josef.pktd at gmail.com
> To: SciPy Developers List <scipy-dev at python.org>
> Subject: Re: [SciPy-Dev] add johnson SL distribution
> Message-ID:
> <CAMMTP+BXHOf33E3CxzM9YSpaHKtV189hqmAP=
> xNSuRn4b6okWQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Thu, Jan 3, 2019 at 3:54 PM <josef.pktd at gmail.com> wrote:
>
> >
> >
> > On Thu, Jan 3, 2019 at 3:31 PM Matt Haberland <haberland at ucla.edu>
> wrote:
> >
> >> I am not personally familiar with the Johnson family of distributions
> >> <
> https://books.google.com/books?id=_LvgBwAAQBAJ&pg=PA197&lpg=PA197&dq=johns+su+sb+sl+distributions&source=bl&ots=LBowBmYTse&sig=9KPViyvSlLAFp9EYqi-ejTYgQ30&hl=en&sa=X&ved=2ahUKEwjE6cnvt9LfAhWG458KHdrQAmkQ6AEwDXoECAIQAQ#v=onepage&q=johns%20su%20sb%20sl%20distributions&f=false
> >,
> >> but the SL does seem to complete the set.
> >>
> >> The license for the Matlab implementation does seem to be BSD 3-clause
> >> <https://en.wikipedia.org/wiki/BSD_licenses#3-clause> and thus
> >> compatible with SciPy.
> >>
> >> Seems like a reasonable first issue, but certainly finishing stalled PRs
> >> would be helpful, too!
> >>
> >> Matt Haberland
> >>
> >> On Thu, Jan 3, 2019 at 10:09 AM Michael Watson <
> >> mike.watson at sheffield.ac.uk> wrote:
> >>
> >>> Hi all, happy new year,
> >>> We have the SB and SU Johnson distributions implemented but not the SL
> >>> distribution, it doesn't look like much work to add it in if it's
> >>> appropriate, I'm doing some work with these distributions and
> ultimately
> >>> would like to implement functions to fit by moments and by quantiles
> too.
> >>> there are existing implementations that are distributed under the BSD
> >>> licence here:
> >>>
> >>>
> >>>
> https://uk.mathworks.com/matlabcentral/fileexchange/46123-johnson-curve-toolbox
> >>>
> >>> so it doesn't seem like a big job from my point of view and I'll be
> >>> doing it anyway.
> >>>
> >>> it would also be my first contribution so if it would be better to
> start
> >>> with another issue (I saw a list and 2 stalled PRs in another email)
> then
> >>> try to add functionality just say and I can look at contributing other
> ways
> >>> first.
> >>>
> >>
> > In general to adding new distributions
> >
> > The speed of getting a new distribution in depends a lot on how well it
> > fits into the general distribution pattern and whether all core methods
> are
> > available as closed form expression or by using scipy.special functions.
> > If that is the case, then adding a new distribution is easy.
> > If that is not the case, then it can be difficult to get a good version
> > merged. One difficult case is if the pdf is only available as
> > computationally expensive numerical approximation.
> >
> > The distributions have in general only the fit method using maximum
> > likelihood estimation of parameters (which might reduce to method of
> > moments in special cases).
> >
> > Based on a quick search it looks like JohnsonSL is just the log-normal
> > distribution (as loc-scale family which is available in scipy)
> >
>
> scipy lognorm is a 3 parameter family, maybe there should also be a 4
> parameter family
>
>
> >
> > Josef
> >
> >
> >> Mike
> >>> _______________________________________________
> >>> SciPy-Dev mailing list
> >>> SciPy-Dev at python.org
> >>> https://mail.python.org/mailman/listinfo/scipy-dev
> >>>
> >>
> >>
> >> --
> >> Matt Haberland
> >> Assistant Adjunct Professor in the Program in Computing
> >> Department of Mathematics
> >> 6617A Math Sciences Building, UCLA
> >> _______________________________________________
> >> SciPy-Dev mailing list
> >> SciPy-Dev at python.org
> >> https://mail.python.org/mailman/listinfo/scipy-dev
> >>
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/scipy-dev/attachments/20190103/5b18e5d7/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Thu, 3 Jan 2019 16:29:22 -0500
> From: josef.pktd at gmail.com
> To: SciPy Developers List <scipy-dev at python.org>
> Subject: Re: [SciPy-Dev] Deprecate planck distribution?
> Message-ID:
> <CAMMTP+A=AtSWy8XH9FsM8mjZ=
> HNQvX9b572p4SWMF765D6sJYw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Thu, Jan 3, 2019 at 9:22 AM Ali Cetin <ali.cetin at outlook.com> wrote:
>
> >
> >
> > ------------------------------
> > *From:* SciPy-Dev <scipy-dev-bounces+ali.cetin=outlook.com at python.org>
> on
> > behalf of Robert Kern <robert.kern at gmail.com>
> > *Sent:* Wednesday, January 2, 2019 21:07
> > *To:* SciPy Developers List
> > *Subject:* Re: [SciPy-Dev] Deprecate planck distribution?
> >
> > On Wed, Jan 2, 2019 at 1:36 AM Christoph Baumgarten <
> > christoph.baumgarten at gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > > happy new year!
> > >
> > > I noted that the Planck distribution is a geometric distribution with a
> > different parametrization, see Issue #9359:
> > >
> > > import numpy as np
> > > from scipy.stats import planck, geom
> > >
> > > a = 0.5
> > > k = np.arange(20)
> > > sum(abs(geom.pmf(k, 1-np.exp(-a), loc=-1) - planck.pmf(k, a))) #
> 1.30e-18
> > >
> > > I don't know if there is a specific reason to have the Planck
> > distribution in addition to the geometric. If not, I would propose to
> > deprecate it.
> > >
> > > Any views? Thanks
> >
> > If we were to turn back time, and the question was whether to *add* the
> > Planck distribution given that we had the geometric distribution, I would
> > probably be convinced by this. However, given that the Planck
> distribution
> > has already been added, I don't think that it's worth removing it. The
> > marginal cost to having this alternate parameterization is likely less
> than
> > the cost of anyone changing their code.
> >
> > The collection of probability distributions are also a place where some
> > nontrivial duplication actually has some positive value. People typically
> > come to `scipy.stats` with a distribution (with a name and specific
> > parameterization conventions) already in mind. Having more than one
> > parameterization available helps people recognize the distribution that
> > they want; having an alternate present doesn't impair the search task
> while
> > not having one they are looking for (or burying it in the Notes of the
> > docstring of the canonical version) can make the search task much harder.
> > It's a common complaint that `scipy.stats` doesn't expose certain common
> > parameterizations of distributions, so we should probably be working to
> > expand the collection of parameterizations rather than collapsing them.
> >
> >
> > Robert Kern
> >
> > I agree with Robert on this one. If you want to go down that rat hole,
> you
> > will quickly find that most distribution functions are mere special cases
> > and/or alternative parameterizations of a few general classes of
> > distributions. If the concern is code management, then it could be argued
> > that an effort should be made on abstracting distribution functions from
> > these more general classes. However, personally, I prefer transparency
> and
> > consistency with established literature when it comes to parametrization.
> >
>
> I think there is a good reason for implementing special cases instead of
> only general cases because then computational simplifications can be used,
> e.g. using only general distribution with several extra parameters is
> cumbersome and requires a lot more work for the user, e.g. in setting all
> the extra parameters to their special case values.
>
> This is not the case for pure reparameterization that still have the same
> number of parameters.
>
> The main straight jacket in the scipy.stats distribution case in terms of
> parameterization is that all continuous distributions use the loc-scale
> (plus possibly shape) parameterization.
> I think there are enough maintainers now (where I don't count myself), that
> it would be feasible to add other distribution classes that don't have to
> follow the loc-scale parameterization, or that could be intermediate
> classes for groups of similar distributions.
>
> For example, I think something similar to the frozen distribution class
> could be added that is just a Reparameterization class, i.e. internally
> delegates to a standard scipy distribution, but uses a parameterization and
> parameter transformation that is more common and more familiar to users.
> Another advantage of reparameterization classes would be that estimation is
> often easier or more interpretable in a different parameterization. E.g.
> statsmodels uses negativebinomial in the mean-dispersion parameterization
> instead of the common negbin parameterization.
> Another advantage of that is that the hessian, covariance of the parameter
> estimates has often a nicer shape in different parameterization.
>
> A example for a intermediate class would be common support for distribution
> that are created by a transformation of another, mainly normal
> distribution.
> This includes the Johnson system of distribution in the other open thread
> on the list.
>
> (Just some thoughts, I'm currently not in this neighborhood of stats.)
>
> Josef
>
>
>
> >
> > That's my two cents on the issue.
> >
> > Cheers,
> > Ali Cetin
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev at python.org
> > https://mail.python.org/mailman/listinfo/scipy-dev
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/scipy-dev/attachments/20190103/f9e0f17f/attachment.html
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
>
> ------------------------------
>
> End of SciPy-Dev Digest, Vol 183, Issue 6
> *****************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20190105/c4b182e0/attachment-0001.html>
More information about the SciPy-Dev
mailing list