
Hi, Below are two proposals to handle the documentation of the scipy distributions. The first is to add a set of examples to each distribution, see the list at the end of the mail as an example. However, I actually wonder whether it wouldn't be better to put this stuff in the stats tutorial. (I recently updated this, but given the list below, it is still not complete.) The list below is a bit long... too long perhaps. I actualy get the feeling that, given the enormous diversity of the distributions, it may not be possible to automatically generate a set of simple examples that work for each and every distributions. Such examples then would involve the usage of x.dist.b, and so on, and this is not particularly revealing to first (and second) time users. A possible resolution is to include just one or two generic examples in the example doc string (e.g., dist.rvs(size = (2,3)) ), and refer to the tutorial for the rest. The tutorial then should show extensive examples for each method of the norm distribution. I assume that then any user of other distributions can figure out how to proceed for his/her own distribution. The second possibility would be to follow Josef's suggestion: --snip snip Splitting up the distributions pdf docs in tutorial into separate pages for individual distributions, make them nicer with code and graphs and link them from the docstring of the distribution. This would keep the docstring itself from blowing up, but we could get the full html reference if we need to. --snip snip This idea offers a lot of opportunities. In a previous mail I mentioned that I don't quite like that the documentation is spread over multiple documents. There are doc strings in distributions.py (leading to a bloated file), and there is continuous.rst. Part of the implementation can be understood from the doc-string, typically, the density function, but not the rest; this requires continuous.rst. Besides this, in case some specific distribution requires extra explanation/examples, this will have to put in the doc-string, making distributions.py longer still. Thus, to take up Josef's suggestion, what about a documentation file organised like this: # some tag to tell that these are the docs for the norm distribution # eg. # norm_gen Normal Distribution ---------------------------- Notes ^^^^^^^ # should be used by the interpreter The probability density function for `norm` is:: norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi) Simple Examples ^^^^^^^^^^^^^^^^^^^^ # used for by interpreter >>> norm.rvs( size = (2,3) ) Extensive Examples ^^^^^^^^^^^^^^^^^^^^^^^^ # Not used by the interpreter, but certainly by a html viewer, containing graphs, hard/specific examples. Mathematical Details ^^^^^^^^^^^^^^^^^^^^^^ Stuff from continuous.rst # dist2_gen Distribution number 2 ----------------------------------------- etc It shouldn't be too hard to parse such a document, and couple each piece of documentation to a distribution in distributions.py (or am I mistaken?) as we use the class name as the tag in the documentation file. The doc-string for a distribution in distributions.py can then be removed, Nicky Example for the examples section of the docstring of norm. Notes ----- The probability density function for `norm` is:: norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi) #%(example)s Examples -------- Setting the mean and standard deviation: >>> from scipy.stats import norm >>> norm.cdf(0.0) >>> norm.cdf(0., 1) # set mu = loc = 1 >>> norm.cdf(0., 1, 2) # mu = loc = 1, scale = sigma = 2 >>> norm.cdf(0., loc = 1, scale = 2) # mu = loc = 1, scale = sigma = 2 Frozen rvs >>> norm(1., 2.).cdf(0) >>> x = norm(scale = 2.) >>> x.cdf(0.0) Moments >>> norm(loc = 2).stats() >>> norm.mean() >>> norm.moment(2, scale = 3.) >>> x.std() >>> x.var() Random number generation >>> norm.rvs(3, 1, size = (2,3)) # loc = 3, scale =1, array of shape (2,3) >>> norm.rvs(3, 1, size = [2,3]) >>> x.rvs(3) # array with 3 random deviates >>> x.rvs([3,4]) # array of shape (3,4) with deviates Expectations >>> norm.expect(lambda x: x, loc = 1) # 1.00000 >>> norm.expect(lambda x: x**2, loc = 1., scale = 2.) # second moment Support of the distribution >>> norm.a # left limit, -np.inf here >>> norm.b # right limit, np.inf here Plot of the cdf >>> import numpy as np >>> x = np.linspace(0, 3) >>> P = norm.cdf(x) >>> plt.plot(x,P) >>> plt.show()

On Sun, Sep 16, 2012 at 11:10 PM, nicky van foreest <vanforeest@gmail.com>wrote:
Hi,
Below are two proposals to handle the documentation of the scipy distributions.
The first is to add a set of examples to each distribution, see the list at the end of the mail as an example. However, I actually wonder whether it wouldn't be better to put this stuff in the stats tutorial. (I recently updated this, but given the list below, it is still not complete.) The list below is a bit long... too long perhaps.
I actualy get the feeling that, given the enormous diversity of the distributions, it may not be possible to automatically generate a set of simple examples that work for each and every distributions. Such examples then would involve the usage of x.dist.b, and so on, and this is not particularly revealing to first (and second) time users.
This is exactly what the problem is currently.
A possible resolution is to include just one or two generic examples in the example doc string (e.g., dist.rvs(size = (2,3)) ), and refer to the tutorial for the rest. The tutorial then should show extensive examples for each method of the norm distribution. I assume that then any user of other distributions can figure out how to proceed for his/her own distribution.
This is a huge amount of work, and the generic example still won't run if you copy-paste it into a terminal.
The second possibility would be to follow Josef's suggestion: --snip snip Splitting up the distributions pdf docs in tutorial into separate pages for individual distributions, make them nicer with code and graphs and link them from the docstring of the distribution.
Linking to the tutorial from the docstrings is a good idea, but the docstrings themselves should be enough to get started.
This would keep the docstring itself from blowing up, but we could get the full html reference if we need to.
--snip snip
This idea offers a lot of opportunities. In a previous mail I mentioned that I don't quite like that the documentation is spread over multiple documents. There are doc strings in distributions.py (leading to a bloated file),
It's not that bad imho. The typical docstring looks like: """A beta prima continuous random variable. %(before_notes)s Notes ----- The probability density function for `betaprime` is:: betaprime.pdf(x, a, b) = gamma(a+b) / (gamma(a)*gamma(b)) * x**(a-1) * (1-x)**(-a-b) for ``x > 0``, ``a > 0``, ``b > 0``. %(example)s """ It can't be much shorter than that. and there is continuous.rst. Part of the
implementation can be understood from the doc-string, typically, the density function, but not the rest;
The pdf and support are given, that's enough to define the distribution. So that should stay. It doesn't mean we have to copy the whole wikipedia page for each distribution.
this requires continuous.rst. Besides this, in case some specific distribution requires extra explanation/examples, this will have to put in the doc-string, making distributions.py longer still. Thus, to take up Josef's suggestion, what about a documentation file organised like this:
Are you suggesting a reST page here, or a .py file with only docs, and new magic to make part of the content show up as docstring? The former sounds better to me.
# some tag to tell that these are the docs for the norm distribution # eg. # norm_gen
Normal Distribution ----------------------------
Notes ^^^^^^^ # should be used by the interpreter The probability density function for `norm` is::
norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi)
Simple Examples ^^^^^^^^^^^^^^^^^^^^ # used for by interpreter >>> norm.rvs( size = (2,3) )
Extensive Examples ^^^^^^^^^^^^^^^^^^^^^^^^ # Not used by the interpreter, but certainly by a html viewer, containing graphs, hard/specific examples.
Mathematical Details ^^^^^^^^^^^^^^^^^^^^^^
Stuff from continuous.rst
# dist2_gen Distribution number 2 ----------------------------------------- etc
It shouldn't be too hard to parse such a document, and couple each piece of documentation to a distribution in distributions.py (or am I mistaken?) as we use the class name as the tag in the documentation file. The doc-string for a distribution in distributions.py can then be removed,
Nicky
Example for the examples section of the docstring of norm.
This example is good. Perhaps the frozen distribution needs a few words of explanation. I suggest to do a few more of these for common distributions, and link to the norm() docstring from less common distributions. Other than that, I wouldn't change anything about the docstrings. Built docs could be reworked more thoroughly. Ralf
Notes ----- The probability density function for `norm` is::
norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi)
#%(example)s
Examples --------
Setting the mean and standard deviation:
>>> from scipy.stats import norm >>> norm.cdf(0.0) >>> norm.cdf(0., 1) # set mu = loc = 1 >>> norm.cdf(0., 1, 2) # mu = loc = 1, scale = sigma = 2 >>> norm.cdf(0., loc = 1, scale = 2) # mu = loc = 1, scale = sigma = 2
Frozen rvs
>>> norm(1., 2.).cdf(0) >>> x = norm(scale = 2.) >>> x.cdf(0.0)
Moments
>>> norm(loc = 2).stats() >>> norm.mean() >>> norm.moment(2, scale = 3.) >>> x.std() >>> x.var()
Random number generation
>>> norm.rvs(3, 1, size = (2,3)) # loc = 3, scale =1, array of shape (2,3) >>> norm.rvs(3, 1, size = [2,3]) >>> x.rvs(3) # array with 3 random deviates >>> x.rvs([3,4]) # array of shape (3,4) with deviates
Expectations
>>> norm.expect(lambda x: x, loc = 1) # 1.00000 >>> norm.expect(lambda x: x**2, loc = 1., scale = 2.) # second moment
Support of the distribution
>>> norm.a # left limit, -np.inf here >>> norm.b # right limit, np.inf here
Plot of the cdf
>>> import numpy as np >>> x = np.linspace(0, 3) >>> P = norm.cdf(x) >>> plt.plot(x,P) >>> plt.show() _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

HI Ralf, Sorry for being so slow at getting back to your comments. I have definitely not forgotten this mail. However, for the next few weeks I have to a considerable amount of teaching... Once my workload is a bit lower, I 'll come up with a plan. Nicky On 21 September 2012 21:27, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Sun, Sep 16, 2012 at 11:10 PM, nicky van foreest <vanforeest@gmail.com> wrote:
Hi,
Below are two proposals to handle the documentation of the scipy distributions.
The first is to add a set of examples to each distribution, see the list at the end of the mail as an example. However, I actually wonder whether it wouldn't be better to put this stuff in the stats tutorial. (I recently updated this, but given the list below, it is still not complete.) The list below is a bit long... too long perhaps.
I actualy get the feeling that, given the enormous diversity of the distributions, it may not be possible to automatically generate a set of simple examples that work for each and every distributions. Such examples then would involve the usage of x.dist.b, and so on, and this is not particularly revealing to first (and second) time users.
This is exactly what the problem is currently.
A possible resolution is to include just one or two generic examples in the example doc string (e.g., dist.rvs(size = (2,3)) ), and refer to the tutorial for the rest. The tutorial then should show extensive examples for each method of the norm distribution. I assume that then any user of other distributions can figure out how to proceed for his/her own distribution.
This is a huge amount of work, and the generic example still won't run if you copy-paste it into a terminal.
The second possibility would be to follow Josef's suggestion: --snip snip Splitting up the distributions pdf docs in tutorial into separate pages for individual distributions, make them nicer with code and graphs and link them from the docstring of the distribution.
Linking to the tutorial from the docstrings is a good idea, but the docstrings themselves should be enough to get started.
This would keep the docstring itself from blowing up, but we could get the full html reference if we need to.
--snip snip
This idea offers a lot of opportunities. In a previous mail I mentioned that I don't quite like that the documentation is spread over multiple documents. There are doc strings in distributions.py (leading to a bloated file),
It's not that bad imho. The typical docstring looks like: """A beta prima continuous random variable.
%(before_notes)s
Notes ----- The probability density function for `betaprime` is::
betaprime.pdf(x, a, b) = gamma(a+b) / (gamma(a)*gamma(b)) * x**(a-1) * (1-x)**(-a-b)
for ``x > 0``, ``a > 0``, ``b > 0``.
%(example)s """
It can't be much shorter than that.
and there is continuous.rst. Part of the implementation can be understood from the doc-string, typically, the density function, but not the rest;
The pdf and support are given, that's enough to define the distribution. So that should stay. It doesn't mean we have to copy the whole wikipedia page for each distribution.
this requires continuous.rst. Besides this, in case some specific distribution requires extra explanation/examples, this will have to put in the doc-string, making distributions.py longer still. Thus, to take up Josef's suggestion, what about a documentation file organised like this:
Are you suggesting a reST page here, or a .py file with only docs, and new magic to make part of the content show up as docstring? The former sounds better to me.
# some tag to tell that these are the docs for the norm distribution # eg. # norm_gen
Normal Distribution ----------------------------
Notes ^^^^^^^ # should be used by the interpreter The probability density function for `norm` is::
norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi)
Simple Examples ^^^^^^^^^^^^^^^^^^^^ # used for by interpreter >>> norm.rvs( size = (2,3) )
Extensive Examples ^^^^^^^^^^^^^^^^^^^^^^^^ # Not used by the interpreter, but certainly by a html viewer, containing graphs, hard/specific examples.
Mathematical Details ^^^^^^^^^^^^^^^^^^^^^^
Stuff from continuous.rst
# dist2_gen Distribution number 2 ----------------------------------------- etc
It shouldn't be too hard to parse such a document, and couple each piece of documentation to a distribution in distributions.py (or am I mistaken?) as we use the class name as the tag in the documentation file. The doc-string for a distribution in distributions.py can then be removed,
Nicky
Example for the examples section of the docstring of norm.
This example is good. Perhaps the frozen distribution needs a few words of explanation. I suggest to do a few more of these for common distributions, and link to the norm() docstring from less common distributions. Other than that, I wouldn't change anything about the docstrings. Built docs could be reworked more thoroughly.
Ralf
Notes ----- The probability density function for `norm` is::
norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi)
#%(example)s
Examples --------
Setting the mean and standard deviation:
>>> from scipy.stats import norm >>> norm.cdf(0.0) >>> norm.cdf(0., 1) # set mu = loc = 1 >>> norm.cdf(0., 1, 2) # mu = loc = 1, scale = sigma = 2 >>> norm.cdf(0., loc = 1, scale = 2) # mu = loc = 1, scale = sigma = 2
Frozen rvs
>>> norm(1., 2.).cdf(0) >>> x = norm(scale = 2.) >>> x.cdf(0.0)
Moments
>>> norm(loc = 2).stats() >>> norm.mean() >>> norm.moment(2, scale = 3.) >>> x.std() >>> x.var()
Random number generation
>>> norm.rvs(3, 1, size = (2,3)) # loc = 3, scale =1, array of shape (2,3) >>> norm.rvs(3, 1, size = [2,3]) >>> x.rvs(3) # array with 3 random deviates >>> x.rvs([3,4]) # array of shape (3,4) with deviates
Expectations
>>> norm.expect(lambda x: x, loc = 1) # 1.00000 >>> norm.expect(lambda x: x**2, loc = 1., scale = 2.) # second moment
Support of the distribution
>>> norm.a # left limit, -np.inf here >>> norm.b # right limit, np.inf here
Plot of the cdf
>>> import numpy as np >>> x = np.linspace(0, 3) >>> P = norm.cdf(x) >>> plt.plot(x,P) >>> plt.show() _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
participants (2)
-
nicky van foreest
-
Ralf Gommers