[Python-Dev] PEP 287: reStructuredText Standard Docstring Format

David Goodger goodger@users.sourceforge.net
Tue, 02 Apr 2002 00:28:17 -0500

Here's a serious proposal, safe to post now that April Fool's is over.
Please read & comment.

David Goodger    goodger@users.sourceforge.net    Open-source projects:
 - Python Docstring Processing System: http://docstring.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net
 - The Go Tools Project: http://gotools.sourceforge.net

PEP: 287
Title: reStructuredText Standard Docstring Format
Version: $Revision: 1.3 $
Last-Modified: $Date: 2002/04/02 03:50:38 $
Author: goodger@users.sourceforge.net (David Goodger)
Discussions-To: doc-sig@python.org
Status: Draft
Type: Informational
Created: 25-Mar-2002
Post-History: 02-Apr-2002
Replaces: 216


    When plaintext hasn't been expressive enough for inline
    documentation, Python programmers have sought out a format for
    docstrings.  This PEP proposes that the reStructuredText markup
    [1]_ be adopted as the standard markup format for structured
    plaintext documentation in Python docstrings, and for PEPs and
    ancillary documents as well.  reStructuredText is a rich and
    extensible yet easy-to-read, what-you-see-is-what-you-get
    plaintext markup syntax.

    Only the low-level syntax of docstrings is addressed here.  This
    PEP is not concerned with docstring semantics or processing at
    all.  Nor is it an attempt to deprecate pure plaintext docstrings,
    which are always going to be legitimate.  The reStructuredText
    markup is an alternative for those who want more expressive


    Programmers are by nature a lazy breed.  We reuse code with
    functions, classes, modules, and subsystems.  Through its
    docstring syntax, Python allows us to document our code from
    within.  The "holy grail" of the Python Documentation Special
    Interest Group (Doc-SIG) [2]_ has been a markup syntax and toolset
    to allow auto-documentation, where the docstrings of Python
    systems can be extracted in context and processed into useful,
    high-quality documentation for multiple purposes.

    The proposed format (reStructuredText) is entirely readable in
    plaintext format, and many of the markup forms match common usage
    (e.g., ``*emphasis*``), so it reads quite naturally.  Yet it is
    rich enough to produce complex documents, and extensible so that
    there are few limits.

    The reStructuredText parser is available now.  The Docutils
    project is at the point where standalone reStructuredText
    documents can be converted to HTML; other output format writers
    will become available over time.  Work is progressing on a Python
    source "Reader" which will implement auto-documentation.  Authors
    of existing auto-documentation tools are encouraged to integrate
    the reStructuredText parser into their projects, or better yet, to
    join forces to produce a world-class toolset for the Python
    standard library.

    Tools will become available in the near future, which will allow
    programmers to generate HTML for online help, XML for multiple
    purposes, and perhaps eventually PDF/DocBook/LaTeX for printed
    documentation, essentially "for free" from the existing
    docstrings.  The adoption of a standard will, at the very least,
    benefit docstring processing tools by preventing further
    "reinventing the wheel".

    Eventually PyDoc, the one existing standard auto-documentation
    tool, could have reStructuredText support added.  In the interim
    it will have no problem with reStructuredText markup, since it
    treats all docstrings as plaintext.


    These are the generally accepted goals for a docstring format, as
    discussed in the Doc-SIG:

    1. It must be readable in source form by the casual observer.

    2. It must be easy to type with any standard text editor.

    3. It must not need to contain information which can be deduced
       from parsing the module.

    4. It must contain sufficient information (structure) so it can be
       converted to any reasonable markup format.

    5. It must be possible to write a module's entire documentation in
       docstrings, without feeling hampered by the markup language.

    reStructuredText meets and exceeds all of these goals, and sets
    its own goals as well, even more stringent.  See "Features" below.

    The goals of this PEP are as follows:

    1. To establish reStructuredText as a standard docstring format by
       attaining "accepted" status (Python community consensus; BDFL
       pronouncement).  Once reStructuredText is a Python standard,
       effort can be focused on tools instead of arguing for a
       standard.  Python needs a standard set of documentation tools.

    2. To address any related concerns raised by the Python community.

    3. To encourage community support.  As long as multiple competing
       markups are out there, the development community remains
       fractured.  Once a standard exists, people will start to use
       it, and momentum will inevitably gather.

    4. To consolidate efforts from related auto-documentation
       projects.  It is hoped that interested developers will join
       forces and work on a joint/merged/common implementation.

    5. To adopt reStructuredText as the standard markup for PEPs.  One
       or both of the following strategies may be applied:

       a) Keep the existing PEP section structure constructs (one-line
          section headers, indented body text).  Subsections can
          either be forbidden or supported with underlined headers in
          the indented body text.

       b) Replace the PEP section structure constructs with the
          reStructuredText syntax.  Section headers will require
          underlines, subsections will be supported out of the box,
          and body text need not be indented (except for block

       Support for RFC 2822 headers will be added to the
       reStructuredText parser (unambiguous given a specific context:
       the first contiguous block of a PEP document).  It may be
       desired to concretely specify what over/underline styles are
       allowed for PEP section headers, for uniformity.

    6. To adopt reStructuredText as the standard markup for
       README-type files and other standalone documents in the Python


    The lack of a standard syntax for docstrings has hampered the
    development of standard tools for extracting and converting
    docstrings into documentation in standard formats (e.g., HTML,
    DocBook, TeX).  There have been a number of proposed markup
    formats and variations, and many tools tied to these proposals,
    but without a standard docstring format they have failed to gain a
    strong following and/or floundered half-finished.

    Throughout the existence of the Doc-SIG, consensus on a single
    standard docstring format has never been reached.  A lightweight,
    implicit markup has been sought, for the following reasons (among

    1. Docstrings written within Python code are available from within
       the interactive interpreter, and can be 'print'ed.  Thus the
       use of plaintext for easy readability.

    2. Programmers want to add structure to their docstrings, without
       sacrificing raw docstring readability.  Unadorned plaintext
       cannot be transformed ('up-translated') into useful structured

    3. Explicit markup (like XML or TeX) is widely considered
       unreadable by the uninitiated.

    4. Implicit markup is aesthetically compatible with the clean and
       minimalist Python syntax.

    Proposed alternatives have included:

    - XML [3]_, SGML [4]_, DocBook [5]_, HTML [6]_, XHTML [7]_

      XML and SGML are explicit, well-formed meta-languages suitable
      for all kinds of documentation.  XML is a variant of SGML.  They
      are best used behind the scenes, because they are verbose,
      difficult to type, and too cluttered to read comfortably as
      source.  DocBook, HTML, and XHTML are all applications of SGML
      and/or XML, and all share the same basic syntax and the same

    - TeX [8]_

      TeX is similar to XML/SGML in that it's explicit, not very easy
      to write, and not easy for the uninitiated to read.

    - Perl POD [9]_

      Most Perl modules are documented in a format called POD -- Plain
      Old Documentation.  This is an easy-to-type, very low level
      format with strong integration with the Perl parser.  Many tools
      exist to turn POD documentation into other formats: info, HTML
      and man pages, among others.  However, the POD syntax takes
      after Perl itself in terms of readability.

    - JavaDoc [10]_

      Special comments before Java classes and functions serve to
      document the code.  A program to extract these, and turn them
      into HTML documentation is called javadoc, and is part of the
      standard Java distribution.  However, the only output format
      that is supported is HTML, and JavaDoc has a very intimate
      relationship with HTML, using HTML tags for most markup.  Thus
      it shares the readability problems of HTML.

    - Setext [11]_, StructuredText [12]_

      Early on, variants of Setext (Structure Enhanced Text),
      including Zope Corp's StructuredText, were proposed for Python
      docstring formatting.  Hereafter these variants will
      collectively be call 'STexts'.  STexts have the advantage of
      being easy to read without special knowledge, and relatively
      easy to write.

      Although used by some (including in most existing Python
      auto-documentation tools), until now STexts have failed to
      become standard because:

      - STexts have been incomplete.  Lacking "essential" constructs
        that people want to use in their docstrings, STexts are
        rendered less than ideal.  Note that these "essential"
        constructs are not universal; everyone has their own

      - STexts have been sometimes surprising.  Bits of text are
        unexpectedly interpreted as being marked up, leading to user

      - SText implementations have been buggy.

      - Most STexts have have had no formal specification except for
        the implementation itself.  A buggy implementation meant a
        buggy spec, and vice-versa.

      - There has been no mechanism to get around the SText markup
        rules when a markup character is used in a non-markup context.

    Proponents of implicit STexts have vigorously opposed proposals
    for explicit markup (XML, HTML, TeX, POD, etc.), and the debates
    have continued off and on since 1996 or earlier.

    reStructuredText is a complete revision and reinterpretation of
    the SText idea, addressing all of the problems listed above.


    Rather than repeating or summarizing the extensive
    reStructuredText spec, please read the originals available from
    http://structuredtext.sourceforge.net/spec/ (.txt & .html files).
    Reading the documents in following order is recommended:

    - An Introduction to reStructuredText [13]_

    - Problems With StructuredText [14]_ (optional for those who have
      used StructuredText; it explains many markup decisions made)

    - reStructuredText Markup Specification [15]_

    - A Record of reStructuredText Syntax Alternatives [16]_ (explains
      markup decisions made independently of StructuredText)

    - reStructuredText Directives [17]_

    There is also a "Quick reStructuredText" user reference [18]_.

    A summary of features addressing often-raised docstring markup
    concerns follows:

    - A markup escaping mechanism.

      Backslashes (``\``) are used to escape markup characters when
      needed for non-markup purposes.  However, the inline markup
      recognition rules have been constructed in order to minimize the
      need for backslash-escapes.  For example, although asterisks are
      used for *emphasis*, in non-markup contexts such as "*" or "(*)"
      or "x * y", the asterisks are not interpreted as markup and are
      left unchanged.  For many non-markup uses of backslashes (e.g.,
      describing regular expressions), inline literals or literal
      blocks are applicable; see the next item.

    - Markup to include Python source code and Python interactive
      sessions: inline literals, literal blocks, and doctest blocks.

      Inline literals use ``double-backquotes`` to indicate program
      I/O or code snippets.  No markup interpretation (including
      backslash-escape [``\``] interpretation) is done within inline

      Literal blocks (block-level literal text, such as code excerpts
      or ASCII graphics) are indented, and indicated with a
      double-colon ("::") at the end of the preceding paragraph (right
      here -->)::

          if literal_block:
              text = 'is left as-is'
              spaces_and_linebreaks = 'are preserved'
              markup_processing = None

      Doctest blocks begin with ">>> " and end with a blank line.
      Neither indentation nor literal block double-colons are
      required.  For example::

          Here's a doctest block:

          >>> print 'Python-specific usage examples; begun with ">>>"'
          Python-specific usage examples; begun with ">>>"
          >>> print '(cut and pasted from interactive sessions)'
          (cut and pasted from interactive sessions)

    - Markup that isolates a Python identifier: interpreted text.

      Text enclosed in single backquotes is recognized as "interpreted
      text", whose interpretation is application-dependent.  In the
      context of a Python docstring, the default interpretation of
      interpreted text is as Python identifiers.  The text will be
      marked up with a hyperlink connected to the documentation for
      the identifier given.  Lookup rules are the same as in Python
      itself: LGB namespace lookups (local, global, builtin).  The
      "role" of the interpreted text (identifying a class, module,
      function, etc.) is determined implicitly from the namespace
      lookup.  For example::

          class Keeper(Storer):

              Keep data fresher longer.

              Extend `Storer`.  Class attribute `instances` keeps track
              of the number of `Keeper` objects instantiated.

              instances = 0
              """How many `Keeper` objects are there?"""

              def __init__(self):
                  Extend `Storer.__init__()` to keep track of
                  instances.  Keep count in `self.instances` and data
                  in `self.data`.
                  self.instances += 1

                  self.data = []
                  """Store data in a list, most recent last."""

              def storedata(self, data):
                  Extend `Storer.storedata()`; append new `data` to a
                  list (in `self.data`).
                  self.data = data

      Each piece of interpreted text is looked up according to the
      local namespace of the block containing its docstring.

    - Markup that isolates a Python identifier and specifies its type:
      interpreted text with roles.

      Although the Python source context reader is designed not to
      require explicit roles, they may be used.  To classify
      identifiers explicitly, the role is given along with the
      identifier in either prefix or suffix form::

          Use :method:`Keeper.storedata` to store the object's data in

      The syntax chosen for roles is verbose, but necessarily so (if
      anyone has a better alternative, please post it to the Doc-SIG).
      The intention of the markup is that there should be little need
      to use explicit roles; their use is to be kept to an absolute

    - Markup for "tagged lists" or "label lists": field lists.

      Field lists represent a mapping from field name to field body.
      These are mostly used for extension syntax, such as
      "bibliographic field lists" (representing document metadata such
      as author, date, and version) and extension attributes for
      directives (see below).  They may be used to implement docstring
      semantics, such as identifying parameters, exceptions raised,
      etc.; such usage is beyond the scope of this PEP.

      A modified RFC 2822 syntax is used, with a colon *before* as
      well as *after* the field name.  Field bodies are more versatile
      as well; they may contain multiple field bodies (even nested
      field lists).  For example::

          :Date: 2002-03-22
          :Version: 1
              - Me
              - Myself
              - I

      Standard RFC 2822 header syntax cannot be used for this
      construct because it is ambiguous.  A word followed by a colon
      at the beginning of a line is common in written text.

    - Markup extensibility: directives and substitutions.

      Directives are used as an extension mechanism for
      reStructuredText, a way of adding support for new block-level
      constructs without adding new syntax.  Directives for images,
      admonitions (note, caution, etc.), and tables of contents
      generation (among others) have been implemented.  For example,
      here's how to place an image::

          .. image:: mylogo.png

      Substitution definitions allow the power and flexibility of
      block-level directives to be shared by inline text.  For

          The |biohazard| symbol must be used on containers used to
          dispose of medical waste.

          .. |biohazard| image:: biohazard.png

    - Section structure markup.

      Section headers in reStructuredText use adornment via underlines
      (and possibly overlines) rather than indentation.  For example::

          This is a Section Title

          This is a Subsection Title

          This paragraph is in the subsection.

          This is Another Section Title

          This paragraph is in the second section.

Questions & Answers

    Q1: Is reStructuredText rich enough?

    A1: Yes, it is for most people.  If it lacks some construct that
        is required for a specific application, it can be added via
        the directive mechanism.  If a common construct has been
        overlooked and a suitably readable syntax can be found, it can
        be added to the specification and parser.

    Q2: Is reStructuredText *too* rich?

    A2: For specific applications or individuals, perhaps.  In
        general, no.

        Since the very beginning, whenever a markup syntax has been
        proposed on the Doc-SIG, someone has complained about the lack
        of support for some construct or other.  The reply was often
        something like, "These are docstrings we're talking about, and
        docstrings shouldn't have complex markup."  The problem is
        that a construct that seems superfluous to one person may be
        absolutely essential to another.

        reStructuredText takes the opposite approach: it provides a
        rich set of implicit markup constructs (plus a generic
        extension mechanism for explicit markup), allowing for all
        kinds of documents.  If the set of constructs is too rich for
        a particular application, the unused constructs can either be
        removed from the parser (via application-specific overrides)
        or simply omitted by convention.

    Q3: Why not use indentation for section structure, like
        StructuredText does?  Isn't it more "Pythonic"?

    A3: Guido van Rossum wrote the following in a 2001-06-13 Doc-SIG

            I still think that using indentation to indicate
            sectioning is wrong.  If you look at how real books and
            other print publications are laid out, you'll notice that
            indentation is used frequently, but mostly at the
            intra-section level.  Indentation can be used to offset
            lists, tables, quotations, examples, and the like.  (The
            argument that docstrings are different because they are
            input for a text formatter is wrong: the whole point is
            that they are also readable without processing.)

            I reject the argument that using indentation is Pythonic:
            text is not code, and different traditions and conventions
            hold.  People have been presenting text for readability
            for over 30 centuries.  Let's not innovate needlessly.

        See "Section Structure via Indentation" in "Problems With
        StructuredText" [14]_ for further elaboration.

    Q4: Why use reStructuredText for PEPs?  What's wrong with the
        existing standard?

    A4: The existing standard for PEPs is very limited in terms of
        general expressibility, and referencing is especially lacking
        for such a reference-rich document type.  PEPs are currently
        converted into HTML, but the results (mostly monospaced text)
        are less than attractive, and most of the value-added
        potential of HTML is untapped.

        Making reStructuredText the standard markup for PEPs will
        enable much richer expression, including support for section
        structure, inline markup, graphics, and tables.  In several
        PEPs there are ASCII graphics diagrams, which are all that
        plaintext documents can support.  Since PEPs are made
        available in HTML form, the ability to include proper diagrams
        would be immediately useful.

        Current PEP practices allow for reference markers in the form
        "[1]" in the text, and the footnotes/references themselves are
        listed in a section toward the end of the document.  There is
        currently no hyperlinking between the reference marker and the
        footnote/reference itself (it would be possible to add this to
        pep2html.py, but the "markup" as it stands is ambiguous and
        mistakes would be inevitable).  A PEP with many references
        (such as this one ;-) requires a lot of flipping back and
        forth.  When revising a PEP, often new references are added or
        unused references deleted.  It is painful to renumber the
        references, since it has to be done in two places and can have
        a cascading effect (insert a single new reference 1, and every
        other reference has to be renumbered; always adding new
        references to the end is suboptimal).  It is easy for
        references to go out of sync.

        PEPs use references for two purposes: simple URL references
        and footnotes.  reStructuredText differentiates between the
        two.  A PEP might contain references like this::


                This PEP proposes adding frungible doodads [1] to the
                core.  It extends PEP 9876 [2] via the BCA [3]

            References and Footnotes

                [1] http://www.example.org/

                [2] PEP 9876, Let's Hope We Never Get Here

                [3] "Bogus Complexity Addition"

        Reference 1 is a simple URL reference.  Reference 2 is a
        footnote containing text and a URL.  Reference 3 is a footnote
        containing text only.  Rewritten using reStructuredText, this
        PEP could look like this::


            This PEP proposes adding `frungible doodads`_ to the
            core.  It extends PEP 9876 [#pep9876]_ via the BCA [#]_

            .. _frungible doodads: http://www.example.org/

            .. [#pep9876] `PEP 9876`__, Let's Hope We Never Get Here

            __ http://www.python.org/peps/pep-9876.html

            .. [#] "Bogus Complexity Addition"

        URLs and footnotes can be defined close to their references if
        desired, making them easier to read in the source text, and
        making the PEPs easier to revise.  The "References and
        Footnotes" section can be auto-generated with a document tree
        transform.  Footnotes from throughout the PEP would be
        gathered and displayed under a standard header.  If URL
        references should likewise be written out explicitly (in
        citation form), another tree transform could be used.

        URL references can be named ("frungible doodads"), and can be
        referenced from multiple places in the document without
        additional definitions.  When converted to HTML, references
        will be replaced with inline hyperlinks (HTML <A> tags).  The
        two footnotes are automatically numbered, so they will always
        stay in sync.  The first footnote also contains an internal
        reference name, "pep9876", so it's easier to see the
        connection between reference and footnote in the source text.
        Named footnotes can be referenced multiple times, maintaining
        consistent numbering.

        The "#pep9876" footnote could also be written in the form of a

            It extends PEP 9876 [PEP9876]_ ...

            .. [PEP9876] `PEP 9876`_, Let's Hope We Never Get Here

        Footnotes are numbered, whereas citations use text for their

    Q5: Wouldn't it be better to keep the docstring and PEP proposals

    A5: The PEP markup proposal may be removed if it is deemed that
        there is no need for PEP markup, or it could be made into a
        separate PEP.  If accepted, PEP 1, PEP Purpose and Guidelines
        [19]_, and PEP 9, Sample PEP Template [20]_ will be updated.

        It seems natural to adopt a single consistent markup standard
        for all uses of structured plaintext in Python, and to propose
        it all in one place.

    Q6: The existing pep2html.py script converts the existing PEP
        format to HTML.  How will the new-format PEPs be converted to

    A6: One of the deliverables of the Docutils project [21]_ will be
        a new version of pep2html.py with integrated reStructuredText
        parsing.  The Docutils project will support PEPs with a "PEP
        Reader" component, including all functionality currently in
        pep2html.py (auto-recognition of PEP & RFC references).

    Q7: Who's going to convert the existing PEPs to reStructuredText?

    A7: A call for volunteers will be put out to the Doc-SIG and
        greater Python communities.  If insufficient volunteers are
        forthcoming, I (David Goodger) will convert the documents
        myself, perhaps with some level of automation.  A transitional
        system whereby both old and new standards can coexist will be
        easy to implement (and I pledge to implement it if necessary).

    Q8: Why use reStructuredText for README and other ancillary files?

    A8: The reasoning given for PEPs in A4 above also applies to
        README and other ancillary files.  By adopting a standard
        markup, these files can be converted to attractive
        cross-referenced HTML and put up on python.org.  Developers of
        Python projects can also take advantage of this facility for
        their own documentation.

    Q9: Won't the superficial similarity to existing markup
        conventions cause problems, and result in people writing
        invalid markup (and not noticing, because the plaintext looks
        natural)?  How forgiving is reStructuredText of "not quite
        right" markup?

    A9: There will be some mis-steps, as there would be when moving
        from one programming language to another.  As with any
        language, proficiency grows with experience.  Luckily,
        reStructuredText is a very little language indeed.

        As with any syntax, there is the possibility of syntax errors.
        It is expected that a user will run the processing system over
        their input and check the output for correctness.

        In a strict sense, the reStructuredText parser is very
        unforgiving (as it should be; "In the face of ambiguity,
        refuse the temptation to guess" [22]_ applies to parsing
        markup as well as computer languages).  Here's a design goal
        from "An Introduction to reStructuredText" [13]_:

            3. Unambiguous.  The rules for markup must not be open for
               interpretation.  For any given input, there should be
               one and only one possible output (including error

        While unforgiving, at the same time the parser does try to be
        helpful by producing useful diagnostic output ("system
        messages").  The parser reports problems, indicating their
        level of severity (from least to most: debug, info, warning,
        error, severe).  The user or the client software can decide on
        reporting thresholds; they can ignore low-level problems or
        cause high-level problems to bring processing to an immediate
        halt.  Problems are reported during the parse as well as
        included in the output, often with two-way links between the
        source of the problem and the system message explaining it.

    Q10: Will the docstrings in the Python standard library modules be
         converted to reStructuredText?

    A10: Over time, with the help of the developer community, many
         modules will be converted.  Some modules may never be
         converted.  A future toolset will have to allow for

References & Footnotes

    [1] http://structuredtext.sourceforge.net/

    [2] http://www.python.org/sigs/doc-sig/

    [3] http://www.w3.org/XML/

    [4] http://www.oasis-open.org/cover/general.html

    [5] http://docbook.org/tdg/en/html/docbook.html

    [6] http://www.w3.org/MarkUp/

    [7] http://www.w3.org/MarkUp/#xhtml1

    [8] http://www.tug.org/interest.html

    [9] http://www.perldoc.com/perl5.6/pod/perlpod.html

    [10] http://java.sun.com/j2se/javadoc/

    [11] http://docutils.sourceforge.net/mirror/setext.html

    [12] http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage

    [13] An Introduction to reStructuredText

    [14] Problems with StructuredText

    [15] reStructuredText Markup Specification

    [16] A Record of reStructuredText Syntax Alternatives

    [17] reStructuredText Directives

    [18] Quick reStructuredText

    [19] PEP 1, PEP Guidelines, Warsaw, Hylton

    [20] PEP 9, Sample PEP Template, Warsaw

    [21] http://docutils.sourceforge.net/

    [22] From "The Zen of Python (by Tim Peters)",

    [23] PEP 216, Docstring Format, Zadka


    This document has been placed in the public domain.


    Some text is borrowed from PEP 216, Docstring Format [23]_, by
    Moshe Zadka.

    Special thanks to all members past & present of the Python Doc-SIG.

Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70