Hi,
Below are two proposals to handle the documentation of the scipy distributions.
The first is to add a set of examples to each distribution, see the
list at the end of the mail as an example. However, I actually wonder
whether it wouldn't be better to put this stuff in the stats tutorial.
(I recently updated this, but given the list below, it is still not
complete.) The list below is a bit long... too long perhaps.
I actualy get the feeling that, given the enormous diversity of the
distributions, it may not be possible to automatically generate a set
of simple examples that work for each and every distributions. Such
examples then would involve the usage of x.dist.b, and so on, and this
is not particularly revealing to first (and second) time users.
A possible resolution is to include just one or two generic examples
in the example doc string (e.g., dist.rvs(size = (2,3)) ), and refer
to the tutorial for the rest. The tutorial then should show extensive
examples for each method of the norm distribution. I assume that then
any user of other distributions can figure out how to proceed for
his/her own distribution.
The second possibility would be to follow Josef's suggestion:
--snip snip
Splitting up the distributions pdf docs in tutorial into separate
pages for individual distributions, make them nicer with code and
graphs and link them from the docstring of the distribution.
This would keep the docstring itself from blowing up, but we could get
the full html reference if we need to.
--snip snip
This idea offers a lot of opportunities. In a previous mail I
mentioned that I don't quite like that the documentation is spread
over multiple documents. There are doc strings in distributions.py
(leading to a bloated file), and there is continuous.rst. Part of the
implementation can be understood from the doc-string, typically, the
density function, but not the rest; this requires continuous.rst.
Besides this, in case some specific distribution requires extra
explanation/examples, this will have to put in the doc-string, making
distributions.py longer still. Thus, to take up Josef's suggestion,
what about a documentation file organised like this:
# some tag to tell that these are the docs for the norm distribution
# eg.
# norm_gen
Normal Distribution
----------------------------
Notes
^^^^^^^
# should be used by the interpreter
The probability density function for `norm` is::
norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi)
Simple Examples
^^^^^^^^^^^^^^^^^^^^
# used for by interpreter
>>> norm.rvs( size = (2,3) )
Extensive Examples
^^^^^^^^^^^^^^^^^^^^^^^^
# Not used by the interpreter, but certainly by a html viewer,
containing graphs, hard/specific examples.
Mathematical Details
^^^^^^^^^^^^^^^^^^^^^^
Stuff from continuous.rst
# dist2_gen
Distribution number 2
-----------------------------------------
etc
It shouldn't be too hard to parse such a document, and couple each
piece of documentation to a distribution in distributions.py (or am I
mistaken?) as we use the class name as the tag in the documentation
file. The doc-string for a distribution in distributions.py can then
be removed,
Nicky
Example for the examples section of the docstring of norm.
Notes
-----
The probability density function for `norm` is::
norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi)
#%(example)s
Examples
--------
Setting the mean and standard deviation:
>>> from scipy.stats import norm
>>> norm.cdf(0.0)
>>> norm.cdf(0., 1) # set mu = loc = 1
>>> norm.cdf(0., 1, 2) # mu = loc = 1, scale = sigma = 2
>>> norm.cdf(0., loc = 1, scale = 2) # mu = loc = 1, scale = sigma = 2
Frozen rvs
>>> norm(1., 2.).cdf(0)
>>> x = norm(scale = 2.)
>>> x.cdf(0.0)
Moments
>>> norm(loc = 2).stats()
>>> norm.mean()
>>> norm.moment(2, scale = 3.)
>>> x.std()
>>> x.var()
Random number generation
>>> norm.rvs(3, 1, size = (2,3)) # loc = 3, scale =1, array of
shape (2,3)
>>> norm.rvs(3, 1, size = [2,3])
>>> x.rvs(3) # array with 3 random deviates
>>> x.rvs([3,4]) # array of shape (3,4) with deviates
Expectations
>>> norm.expect(lambda x: x, loc = 1) # 1.00000
>>> norm.expect(lambda x: x**2, loc = 1., scale = 2.) # second moment
Support of the distribution
>>> norm.a # left limit, -np.inf here
>>> norm.b # right limit, np.inf here
Plot of the cdf
>>> import numpy as np
>>> x = np.linspace(0, 3)
>>> P = norm.cdf(x)
>>> plt.plot(x,P)
>>> plt.show()