[Python-Dev] PEP 287: reStructuredText Standard Docstring Format
Tue, 02 Apr 2002 00:28:17 -0500
Here's a serious proposal, safe to post now that April Fool's is over.
Please read & comment.
David Goodger email@example.com Open-source projects:
- Python Docstring Processing System: http://docstring.sourceforge.net
- reStructuredText: http://structuredtext.sourceforge.net
- The Go Tools Project: http://gotools.sourceforge.net
Title: reStructuredText Standard Docstring Format
Version: $Revision: 1.3 $
Last-Modified: $Date: 2002/04/02 03:50:38 $
Author: firstname.lastname@example.org (David Goodger)
When plaintext hasn't been expressive enough for inline
documentation, Python programmers have sought out a format for
docstrings. This PEP proposes that the reStructuredText markup
_ be adopted as the standard markup format for structured
plaintext documentation in Python docstrings, and for PEPs and
ancillary documents as well. reStructuredText is a rich and
extensible yet easy-to-read, what-you-see-is-what-you-get
plaintext markup syntax.
Only the low-level syntax of docstrings is addressed here. This
PEP is not concerned with docstring semantics or processing at
all. Nor is it an attempt to deprecate pure plaintext docstrings,
which are always going to be legitimate. The reStructuredText
markup is an alternative for those who want more expressive
Programmers are by nature a lazy breed. We reuse code with
functions, classes, modules, and subsystems. Through its
docstring syntax, Python allows us to document our code from
within. The "holy grail" of the Python Documentation Special
Interest Group (Doc-SIG) _ has been a markup syntax and toolset
to allow auto-documentation, where the docstrings of Python
systems can be extracted in context and processed into useful,
high-quality documentation for multiple purposes.
The proposed format (reStructuredText) is entirely readable in
plaintext format, and many of the markup forms match common usage
(e.g., ``*emphasis*``), so it reads quite naturally. Yet it is
rich enough to produce complex documents, and extensible so that
there are few limits.
The reStructuredText parser is available now. The Docutils
project is at the point where standalone reStructuredText
documents can be converted to HTML; other output format writers
will become available over time. Work is progressing on a Python
source "Reader" which will implement auto-documentation. Authors
of existing auto-documentation tools are encouraged to integrate
the reStructuredText parser into their projects, or better yet, to
join forces to produce a world-class toolset for the Python
Tools will become available in the near future, which will allow
programmers to generate HTML for online help, XML for multiple
purposes, and perhaps eventually PDF/DocBook/LaTeX for printed
documentation, essentially "for free" from the existing
docstrings. The adoption of a standard will, at the very least,
benefit docstring processing tools by preventing further
"reinventing the wheel".
Eventually PyDoc, the one existing standard auto-documentation
tool, could have reStructuredText support added. In the interim
it will have no problem with reStructuredText markup, since it
treats all docstrings as plaintext.
These are the generally accepted goals for a docstring format, as
discussed in the Doc-SIG:
1. It must be readable in source form by the casual observer.
2. It must be easy to type with any standard text editor.
3. It must not need to contain information which can be deduced
from parsing the module.
4. It must contain sufficient information (structure) so it can be
converted to any reasonable markup format.
5. It must be possible to write a module's entire documentation in
docstrings, without feeling hampered by the markup language.
reStructuredText meets and exceeds all of these goals, and sets
its own goals as well, even more stringent. See "Features" below.
The goals of this PEP are as follows:
1. To establish reStructuredText as a standard docstring format by
attaining "accepted" status (Python community consensus; BDFL
pronouncement). Once reStructuredText is a Python standard,
effort can be focused on tools instead of arguing for a
standard. Python needs a standard set of documentation tools.
2. To address any related concerns raised by the Python community.
3. To encourage community support. As long as multiple competing
markups are out there, the development community remains
fractured. Once a standard exists, people will start to use
it, and momentum will inevitably gather.
4. To consolidate efforts from related auto-documentation
projects. It is hoped that interested developers will join
forces and work on a joint/merged/common implementation.
5. To adopt reStructuredText as the standard markup for PEPs. One
or both of the following strategies may be applied:
a) Keep the existing PEP section structure constructs (one-line
section headers, indented body text). Subsections can
either be forbidden or supported with underlined headers in
the indented body text.
b) Replace the PEP section structure constructs with the
reStructuredText syntax. Section headers will require
underlines, subsections will be supported out of the box,
and body text need not be indented (except for block
Support for RFC 2822 headers will be added to the
reStructuredText parser (unambiguous given a specific context:
the first contiguous block of a PEP document). It may be
desired to concretely specify what over/underline styles are
allowed for PEP section headers, for uniformity.
6. To adopt reStructuredText as the standard markup for
README-type files and other standalone documents in the Python
The lack of a standard syntax for docstrings has hampered the
development of standard tools for extracting and converting
docstrings into documentation in standard formats (e.g., HTML,
DocBook, TeX). There have been a number of proposed markup
formats and variations, and many tools tied to these proposals,
but without a standard docstring format they have failed to gain a
strong following and/or floundered half-finished.
Throughout the existence of the Doc-SIG, consensus on a single
standard docstring format has never been reached. A lightweight,
implicit markup has been sought, for the following reasons (among
1. Docstrings written within Python code are available from within
the interactive interpreter, and can be 'print'ed. Thus the
use of plaintext for easy readability.
2. Programmers want to add structure to their docstrings, without
sacrificing raw docstring readability. Unadorned plaintext
cannot be transformed ('up-translated') into useful structured
3. Explicit markup (like XML or TeX) is widely considered
unreadable by the uninitiated.
4. Implicit markup is aesthetically compatible with the clean and
minimalist Python syntax.
Proposed alternatives have included:
- XML _, SGML _, DocBook _, HTML _, XHTML _
XML and SGML are explicit, well-formed meta-languages suitable
for all kinds of documentation. XML is a variant of SGML. They
are best used behind the scenes, because they are verbose,
difficult to type, and too cluttered to read comfortably as
source. DocBook, HTML, and XHTML are all applications of SGML
and/or XML, and all share the same basic syntax and the same
- TeX _
TeX is similar to XML/SGML in that it's explicit, not very easy
to write, and not easy for the uninitiated to read.
- Perl POD _
Most Perl modules are documented in a format called POD -- Plain
Old Documentation. This is an easy-to-type, very low level
format with strong integration with the Perl parser. Many tools
exist to turn POD documentation into other formats: info, HTML
and man pages, among others. However, the POD syntax takes
after Perl itself in terms of readability.
- JavaDoc _
Special comments before Java classes and functions serve to
document the code. A program to extract these, and turn them
into HTML documentation is called javadoc, and is part of the
standard Java distribution. However, the only output format
that is supported is HTML, and JavaDoc has a very intimate
relationship with HTML, using HTML tags for most markup. Thus
it shares the readability problems of HTML.
- Setext _, StructuredText _
Early on, variants of Setext (Structure Enhanced Text),
including Zope Corp's StructuredText, were proposed for Python
docstring formatting. Hereafter these variants will
collectively be call 'STexts'. STexts have the advantage of
being easy to read without special knowledge, and relatively
easy to write.
Although used by some (including in most existing Python
auto-documentation tools), until now STexts have failed to
become standard because:
- STexts have been incomplete. Lacking "essential" constructs
that people want to use in their docstrings, STexts are
rendered less than ideal. Note that these "essential"
constructs are not universal; everyone has their own
- STexts have been sometimes surprising. Bits of text are
unexpectedly interpreted as being marked up, leading to user
- SText implementations have been buggy.
- Most STexts have have had no formal specification except for
the implementation itself. A buggy implementation meant a
buggy spec, and vice-versa.
- There has been no mechanism to get around the SText markup
rules when a markup character is used in a non-markup context.
Proponents of implicit STexts have vigorously opposed proposals
for explicit markup (XML, HTML, TeX, POD, etc.), and the debates
have continued off and on since 1996 or earlier.
reStructuredText is a complete revision and reinterpretation of
the SText idea, addressing all of the problems listed above.
Rather than repeating or summarizing the extensive
reStructuredText spec, please read the originals available from
http://structuredtext.sourceforge.net/spec/ (.txt & .html files).
Reading the documents in following order is recommended:
- An Introduction to reStructuredText _
- Problems With StructuredText _ (optional for those who have
used StructuredText; it explains many markup decisions made)
- reStructuredText Markup Specification _
- A Record of reStructuredText Syntax Alternatives _ (explains
markup decisions made independently of StructuredText)
- reStructuredText Directives _
There is also a "Quick reStructuredText" user reference _.
A summary of features addressing often-raised docstring markup
- A markup escaping mechanism.
Backslashes (``\``) are used to escape markup characters when
needed for non-markup purposes. However, the inline markup
recognition rules have been constructed in order to minimize the
need for backslash-escapes. For example, although asterisks are
used for *emphasis*, in non-markup contexts such as "*" or "(*)"
or "x * y", the asterisks are not interpreted as markup and are
left unchanged. For many non-markup uses of backslashes (e.g.,
describing regular expressions), inline literals or literal
blocks are applicable; see the next item.
- Markup to include Python source code and Python interactive
sessions: inline literals, literal blocks, and doctest blocks.
Inline literals use ``double-backquotes`` to indicate program
I/O or code snippets. No markup interpretation (including
backslash-escape [``\``] interpretation) is done within inline
Literal blocks (block-level literal text, such as code excerpts
or ASCII graphics) are indented, and indicated with a
double-colon ("::") at the end of the preceding paragraph (right
text = 'is left as-is'
spaces_and_linebreaks = 'are preserved'
markup_processing = None
Doctest blocks begin with ">>> " and end with a blank line.
Neither indentation nor literal block double-colons are
required. For example::
Here's a doctest block:
>>> print 'Python-specific usage examples; begun with ">>>"'
Python-specific usage examples; begun with ">>>"
>>> print '(cut and pasted from interactive sessions)'
(cut and pasted from interactive sessions)
- Markup that isolates a Python identifier: interpreted text.
Text enclosed in single backquotes is recognized as "interpreted
text", whose interpretation is application-dependent. In the
context of a Python docstring, the default interpretation of
interpreted text is as Python identifiers. The text will be
marked up with a hyperlink connected to the documentation for
the identifier given. Lookup rules are the same as in Python
itself: LGB namespace lookups (local, global, builtin). The
"role" of the interpreted text (identifying a class, module,
function, etc.) is determined implicitly from the namespace
lookup. For example::
Keep data fresher longer.
Extend `Storer`. Class attribute `instances` keeps track
of the number of `Keeper` objects instantiated.
instances = 0
"""How many `Keeper` objects are there?"""
Extend `Storer.__init__()` to keep track of
instances. Keep count in `self.instances` and data
self.instances += 1
self.data = 
"""Store data in a list, most recent last."""
def storedata(self, data):
Extend `Storer.storedata()`; append new `data` to a
list (in `self.data`).
self.data = data
Each piece of interpreted text is looked up according to the
local namespace of the block containing its docstring.
- Markup that isolates a Python identifier and specifies its type:
interpreted text with roles.
Although the Python source context reader is designed not to
require explicit roles, they may be used. To classify
identifiers explicitly, the role is given along with the
identifier in either prefix or suffix form::
Use :method:`Keeper.storedata` to store the object's data in
The syntax chosen for roles is verbose, but necessarily so (if
anyone has a better alternative, please post it to the Doc-SIG).
The intention of the markup is that there should be little need
to use explicit roles; their use is to be kept to an absolute
- Markup for "tagged lists" or "label lists": field lists.
Field lists represent a mapping from field name to field body.
These are mostly used for extension syntax, such as
"bibliographic field lists" (representing document metadata such
as author, date, and version) and extension attributes for
directives (see below). They may be used to implement docstring
semantics, such as identifying parameters, exceptions raised,
etc.; such usage is beyond the scope of this PEP.
A modified RFC 2822 syntax is used, with a colon *before* as
well as *after* the field name. Field bodies are more versatile
as well; they may contain multiple field bodies (even nested
field lists). For example::
Standard RFC 2822 header syntax cannot be used for this
construct because it is ambiguous. A word followed by a colon
at the beginning of a line is common in written text.
- Markup extensibility: directives and substitutions.
Directives are used as an extension mechanism for
reStructuredText, a way of adding support for new block-level
constructs without adding new syntax. Directives for images,
admonitions (note, caution, etc.), and tables of contents
generation (among others) have been implemented. For example,
here's how to place an image::
.. image:: mylogo.png
Substitution definitions allow the power and flexibility of
block-level directives to be shared by inline text. For
The |biohazard| symbol must be used on containers used to
dispose of medical waste.
.. |biohazard| image:: biohazard.png
- Section structure markup.
Section headers in reStructuredText use adornment via underlines
(and possibly overlines) rather than indentation. For example::
This is a Section Title
This is a Subsection Title
This paragraph is in the subsection.
This is Another Section Title
This paragraph is in the second section.
Questions & Answers
Q1: Is reStructuredText rich enough?
A1: Yes, it is for most people. If it lacks some construct that
is required for a specific application, it can be added via
the directive mechanism. If a common construct has been
overlooked and a suitably readable syntax can be found, it can
be added to the specification and parser.
Q2: Is reStructuredText *too* rich?
A2: For specific applications or individuals, perhaps. In
Since the very beginning, whenever a markup syntax has been
proposed on the Doc-SIG, someone has complained about the lack
of support for some construct or other. The reply was often
something like, "These are docstrings we're talking about, and
docstrings shouldn't have complex markup." The problem is
that a construct that seems superfluous to one person may be
absolutely essential to another.
reStructuredText takes the opposite approach: it provides a
rich set of implicit markup constructs (plus a generic
extension mechanism for explicit markup), allowing for all
kinds of documents. If the set of constructs is too rich for
a particular application, the unused constructs can either be
removed from the parser (via application-specific overrides)
or simply omitted by convention.
Q3: Why not use indentation for section structure, like
StructuredText does? Isn't it more "Pythonic"?
A3: Guido van Rossum wrote the following in a 2001-06-13 Doc-SIG
I still think that using indentation to indicate
sectioning is wrong. If you look at how real books and
other print publications are laid out, you'll notice that
indentation is used frequently, but mostly at the
intra-section level. Indentation can be used to offset
lists, tables, quotations, examples, and the like. (The
argument that docstrings are different because they are
input for a text formatter is wrong: the whole point is
that they are also readable without processing.)
I reject the argument that using indentation is Pythonic:
text is not code, and different traditions and conventions
hold. People have been presenting text for readability
for over 30 centuries. Let's not innovate needlessly.
See "Section Structure via Indentation" in "Problems With
StructuredText" _ for further elaboration.
Q4: Why use reStructuredText for PEPs? What's wrong with the
A4: The existing standard for PEPs is very limited in terms of
general expressibility, and referencing is especially lacking
for such a reference-rich document type. PEPs are currently
converted into HTML, but the results (mostly monospaced text)
are less than attractive, and most of the value-added
potential of HTML is untapped.
Making reStructuredText the standard markup for PEPs will
enable much richer expression, including support for section
structure, inline markup, graphics, and tables. In several
PEPs there are ASCII graphics diagrams, which are all that
plaintext documents can support. Since PEPs are made
available in HTML form, the ability to include proper diagrams
would be immediately useful.
Current PEP practices allow for reference markers in the form
"" in the text, and the footnotes/references themselves are
listed in a section toward the end of the document. There is
currently no hyperlinking between the reference marker and the
footnote/reference itself (it would be possible to add this to
pep2html.py, but the "markup" as it stands is ambiguous and
mistakes would be inevitable). A PEP with many references
(such as this one ;-) requires a lot of flipping back and
forth. When revising a PEP, often new references are added or
unused references deleted. It is painful to renumber the
references, since it has to be done in two places and can have
a cascading effect (insert a single new reference 1, and every
other reference has to be renumbered; always adding new
references to the end is suboptimal). It is easy for
references to go out of sync.
PEPs use references for two purposes: simple URL references
and footnotes. reStructuredText differentiates between the
two. A PEP might contain references like this::
This PEP proposes adding frungible doodads  to the
core. It extends PEP 9876  via the BCA 
References and Footnotes
 PEP 9876, Let's Hope We Never Get Here
 "Bogus Complexity Addition"
Reference 1 is a simple URL reference. Reference 2 is a
footnote containing text and a URL. Reference 3 is a footnote
containing text only. Rewritten using reStructuredText, this
PEP could look like this::
This PEP proposes adding `frungible doodads`_ to the
core. It extends PEP 9876 [#pep9876]_ via the BCA [#]_
.. _frungible doodads: http://www.example.org/
.. [#pep9876] `PEP 9876`__, Let's Hope We Never Get Here
.. [#] "Bogus Complexity Addition"
URLs and footnotes can be defined close to their references if
desired, making them easier to read in the source text, and
making the PEPs easier to revise. The "References and
Footnotes" section can be auto-generated with a document tree
transform. Footnotes from throughout the PEP would be
gathered and displayed under a standard header. If URL
references should likewise be written out explicitly (in
citation form), another tree transform could be used.
URL references can be named ("frungible doodads"), and can be
referenced from multiple places in the document without
additional definitions. When converted to HTML, references
will be replaced with inline hyperlinks (HTML <A> tags). The
two footnotes are automatically numbered, so they will always
stay in sync. The first footnote also contains an internal
reference name, "pep9876", so it's easier to see the
connection between reference and footnote in the source text.
Named footnotes can be referenced multiple times, maintaining
The "#pep9876" footnote could also be written in the form of a
It extends PEP 9876 [PEP9876]_ ...
.. [PEP9876] `PEP 9876`_, Let's Hope We Never Get Here
Footnotes are numbered, whereas citations use text for their
Q5: Wouldn't it be better to keep the docstring and PEP proposals
A5: The PEP markup proposal may be removed if it is deemed that
there is no need for PEP markup, or it could be made into a
separate PEP. If accepted, PEP 1, PEP Purpose and Guidelines
_, and PEP 9, Sample PEP Template _ will be updated.
It seems natural to adopt a single consistent markup standard
for all uses of structured plaintext in Python, and to propose
it all in one place.
Q6: The existing pep2html.py script converts the existing PEP
format to HTML. How will the new-format PEPs be converted to
A6: One of the deliverables of the Docutils project _ will be
a new version of pep2html.py with integrated reStructuredText
parsing. The Docutils project will support PEPs with a "PEP
Reader" component, including all functionality currently in
pep2html.py (auto-recognition of PEP & RFC references).
Q7: Who's going to convert the existing PEPs to reStructuredText?
A7: A call for volunteers will be put out to the Doc-SIG and
greater Python communities. If insufficient volunteers are
forthcoming, I (David Goodger) will convert the documents
myself, perhaps with some level of automation. A transitional
system whereby both old and new standards can coexist will be
easy to implement (and I pledge to implement it if necessary).
Q8: Why use reStructuredText for README and other ancillary files?
A8: The reasoning given for PEPs in A4 above also applies to
README and other ancillary files. By adopting a standard
markup, these files can be converted to attractive
cross-referenced HTML and put up on python.org. Developers of
Python projects can also take advantage of this facility for
their own documentation.
Q9: Won't the superficial similarity to existing markup
conventions cause problems, and result in people writing
invalid markup (and not noticing, because the plaintext looks
natural)? How forgiving is reStructuredText of "not quite
A9: There will be some mis-steps, as there would be when moving
from one programming language to another. As with any
language, proficiency grows with experience. Luckily,
reStructuredText is a very little language indeed.
As with any syntax, there is the possibility of syntax errors.
It is expected that a user will run the processing system over
their input and check the output for correctness.
In a strict sense, the reStructuredText parser is very
unforgiving (as it should be; "In the face of ambiguity,
refuse the temptation to guess" _ applies to parsing
markup as well as computer languages). Here's a design goal
from "An Introduction to reStructuredText" _:
3. Unambiguous. The rules for markup must not be open for
interpretation. For any given input, there should be
one and only one possible output (including error
While unforgiving, at the same time the parser does try to be
helpful by producing useful diagnostic output ("system
messages"). The parser reports problems, indicating their
level of severity (from least to most: debug, info, warning,
error, severe). The user or the client software can decide on
reporting thresholds; they can ignore low-level problems or
cause high-level problems to bring processing to an immediate
halt. Problems are reported during the parse as well as
included in the output, often with two-way links between the
source of the problem and the system message explaining it.
Q10: Will the docstrings in the Python standard library modules be
converted to reStructuredText?
A10: Over time, with the help of the developer community, many
modules will be converted. Some modules may never be
converted. A future toolset will have to allow for
References & Footnotes
 An Introduction to reStructuredText
 Problems with StructuredText
 reStructuredText Markup Specification
 A Record of reStructuredText Syntax Alternatives
 reStructuredText Directives
 Quick reStructuredText
 PEP 1, PEP Guidelines, Warsaw, Hylton
 PEP 9, Sample PEP Template, Warsaw
 From "The Zen of Python (by Tim Peters)",
 PEP 216, Docstring Format, Zadka
This document has been placed in the public domain.
Some text is borrowed from PEP 216, Docstring Format _, by
Special thanks to all members past & present of the Python Doc-SIG.