[Doc-SIG] PEP: Docstring Processing System Framework

David Goodger dgoodger@bigfoot.com
Sun, 03 Jun 2001 10:30:08 -0400

Hello all,

After much thought and reorganization*, I am pleased to offer this candidate
PEP. Two related candidate PEPs follow, all part of the Docstring Processing
System (DPS) project. I will be seeking PEP numbers ASAP; once obtained, I
will post them to comp.lang.python.

The website for this project is http://docstring.sf.net. The three PEPs
posted here, plus supporting XML DTDs, are available for individual browsing
there. I've released version 0.1 of the project, which contains all
specification files as well as the code. Apart from one module
(dps.statemachine, useful for line-based parsing using regular-expressions),
there is only a skeleton of the core system. I welcome any input.

Several similar projects already exist. I invite their authors to take a
look at the approach presented in this PEP and to consider consolidating our
efforts. I will be happy to add developers and project admins to the
SourceForge project; please let me know if you are interested. I would like
this to be an open, community project!

I've simultaneously released the reStructuredText project, an input parser
component for the DPS, at http://structuredtext.sf.net. I'll be posting the
updated specification to Doc-SIG shortly. My hope is that these projects
will form the foundation for a standard documentation tool for Python.

* See the "History" section of

David Goodger    dgoodger@bigfoot.com    Open-source projects:
 - Python Docstring Processing System: http://docstring.sf.net
 - reStructuredText: http://structuredtext.sf.net
 - The Go Tools Project: http://gotools.sf.net


PEP: ???
Title: Docstring Processing System Framework
Version: $Revision$
Author: dgoodger@bigfoot.com (David Goodger)
Discussions-To: doc-sig@python.org
Status: Draft
Type: Standards Track
Requires: (Docstring Conventions PEP),
          (DPS Generic Implementation Details PEP)
Created: 01-Jun-2001


    Python modules, classes and functions have a string attribute called
    __doc__. If the first expression inside the definition is a literal
    string, that string is assigned to the __doc__ attribute, called a
    documentation string or docstring. It is often used to summarize the
    interface of the module, class or function.

    There is no standard format (markup) for docstrings, nor are there
    standard tools for extracting docstrings and transforming them into
    useful structured formats (e.g., HTML, DocBook, TeX). Those tools that
    do exist are for the most part unmaintained and unused. The issues
    surrounding docstring processing have been contentious and difficult to

    This PEP proposes a Docstring Processing System (DPS) framework. It
    separates out the components (program and conceptual), enabling the
    resolution of individual issues either through consensus (one solution)
    or through divergence (many). It promotes standard interfaces which
    will allow a variety of plug-in components (e.g., input parsers and
    output formatters) to be used.

    This PEP presents the concepts of a DPS framework independently of
    implementation details.


    This document has been placed in the public domain.


    This document borrows text from PEP 216 "Docstring Format" by Moshe
    Zadka [1]. It is intended as a reorganization of PEP 216 and its

    This document also borrows ideas from the archives of the Python
    Doc-SIG. Thanks to all members past & present.

Project Website

    A SourceForge project has been set up for this work at


    Python lends itself to inline documentation. With its built-in
    docstring syntax, a limited form of Literate Programming [2] is easy to
    do in Python. However, there are no satisfactory standard tools for
    extracting and processing Python docstrings. The lack of a standard
    toolset is a significant gap in Python's infrastructure; this PEP aims
    to fill the gap.

    There are standard inline documentation systems for some other
    languages. For example, Perl has POD (plain old documentation) and Java
    has Javadoc, but neither of these mesh with the Pythonic way. POD is
    very explicit, but takes after Perl in terms of readability. Javadoc is
    HTML-centric; except for '@field' tags, raw HTML is used for markup.
    There are also general tools such as Autoduck and Web (Tangle & Weave),
    useful for multiple languages.

    There have been many attempts to write autodocumentation systems for
    Python (not an exhaustive list):

    - Marc-Andre Lemburg's doc.py [3]

    - Daniel Larsson's pythondoc & gendoc [4]

    - Doug Hellmann's HappyDoc [5]

    - Laurence Tratt's Crystal [6]

    - Ka-Ping Yee's htmldoc & pydoc [7] (pydoc.py is now part of the Python
      standard library; see below)

    - Tony Ibbs' docutils [8]

    These systems, each with different goals, have had varying degrees of
    success. A problem with many of the above systems was overambition.
    They provided a self-contained set of components: a docstring
    extraction system, an input parser, an internal processing system and
    one or more output formatters. Inevitably, one or more components had
    serious shortcomings, preventing the system from being adopted as a
    standard tool.

    Throughout the existence of the Python Documentation Special Interest
    Group (Doc-SIG) [9], consensus on a single standard docstring format
    has never been reached. A lightweight, implicit markup has been sought,
    for the following reasons (among others):

    1. Docstrings written within Python code are available from within the
       interactive interpreter, and can be 'print'ed. Thus the use of
       plaintext for easy readability.

    2. Programmers want to add structure to their docstrings, without
       sacrificing raw docstring readability. Unadorned plaintext cannot be
       transformed ('up-translated') into useful structured formats.

    3. Explicit markup (like XML or TeX) has been widely considered
       unreadable by the uninitiated.

    4. Implicit markup is aesthetically compatibile with the clean and
       minimalist Python syntax.

    Early on, variants of Setext (Structure Enhanced Text) [10], including
    Digital Creation's StructuredText [11], were proposed for Python
    docstring formatting. Hereafter we will collectively call these
    variants 'STexts'. Although used by some (including in most of the
    above-listed autodocumentation tools), these markup schemes have failed
    to become standard because:

    - STexts have been incomplete: lacking 'essential' constructs that
      people want to use in their docstrings, STexts are rendered less than
      ideal. Note that these 'essential' constructs are not universal;
      everyone has their own requirements.

    - STexts have been sometimes surprising: bits of text are marked up
      unexpectedly, leading to user frustration.

    - SText implementations have been buggy.

    - Some STexts have have had no formal specification except for the
      implementation itself. A buggy implementation meant a buggy spec, and

    - There has been no mechanism to get around the SText markup rules when
      a markup character is used in a non-markup context.

    Recognizing the deficiencies of STexts, some people have proposed using
    explicit markup of some kind. There have been proposals for using XML,
    HTML, TeX, POD, and Javadoc at one time or another. Proponents of
    STexts have vigorously opposed these proposals, and the debates have
    continued off and on for at least five years.

    It has become clear (to this author, at least) that the "all or
    nothing" approach cannot succeed, since no all-encompassing proposal
    could possibly be agreed upon by all interested parties. A modular
    component approach, where components may be multiply implemented, is
    the only chance at success. By separating out the issues, we can form
    consensus more easily (smaller fights ;-), and accept divergence more

    Each of the components of a docstring processing system should be
    developed independently. A 'best of breed' system should be chosen
    and/or developed and eventually included in Python's standard library.

Pydoc & Other Existing Systems

    Pydoc is part of the Python 2.1 standard library. It extracts and
    displays docstrings from within the Python interactive interpreter,
    from the shell command line, and from a GUI window into a web browser
    (HTML). In the case of GUI/HTML, except for some heuristic hyperlinking
    of identifier names, no formatting of the docstrings is done. They are
    presented within <p><small><tt> tags to avoid unwanted line wrapping.
    Unfortunately, the result is not pretty.

    The functionality proposed in this PEP could be added to or used by
    pydoc when serving HTML pages. However, the proposed docstring
    processing system's functionality is much more than pydoc needs (in its
    current form). Either an independent tool will be developed (which
    pydoc may or may not use), or pydoc could be expanded to encompass this
    functionality and *become* the docstring processing system (or one such
    system). That decision is beyond the scope of this PEP.

    Similarly for other existing docstring processing systems, their
    authors may or may not choose compatibility with this framework.
    However, if this framework is accepted and adopted as the Python
    standard, compatibility will become an important consideration in these
    systems' future.


    The docstring processing system framework consists of components, as

    1. Docstring conventions. Documents issues such as:

       - What should be documented where.

       - First line is a one-line synopsis.

    2. Docstring processing system generic implementation details.
       Documents issues such as:

       - High-level spec: what a DPS does.

       - Command-line interface for executable script.

       - System Python API

       - Docsring extraction rules.

       - Input parser API.

       - Intermediate internal data structure: output from input parser,
         input to output formatter.

       - Output formatter API.

       - Output management.

       These issues are applicable to any docstring processing system

    3. Docstring processing system implementation.

    4. Input markup specificiations: docstring syntax.

    5. Input parser implementations.

    6. Output formats (HTML, XML, TeX, DocBook, info, etc.).

    7. Output formatter implementations.

    Components 1, 2, and 3 will be the subject of individual companion
    PEPs, although they may be merged into this PEP once consensus is
    reached. If there is only one implementation, PEPs for components 2 & 3
    can be combined. Multiple PEPs will be necessary for each of components
    4, 5, 6, and 7. An alternative to the PEP mechanism may be used
    instead, since these are not directly related to the Python language.

    The following diagram shows an overview of the framework (very wide--
    apologies for line wrapping; interfaces are indicated by

                                                    | Command-Line Interface
                                                    | Executable Script
+===========================================+ returns   +---------+
                                                    | System Python API
|==========>| output  |
+===========================================+           | objects |
           _    writes        | Python |      reads | Docstring Processing
System               |           +---------+
          / \  ==============>| module |<===========|
          \_/                 +--------+            | input      |
transformation, | output     |            +--------+
           |             +-------------+    follows | docstring  |
integration,    | object     | writes     | output |
         --+--  consults | docsring    |<-----------| extraction | linking
| management |===========>| files  |
           |   --------->| conventions |
+============+=====+=====+=====+============+            +--------+
          / \            +-------------+            | parser API       |
|    formatter API |
         /   \           +-------------+            +===========+======+
+======+===========+            +--------+
        author  consults | markup      | implements | input     |
intermediate      | output    | implements | output |
               --------->| syntax spec |<-----------| parser    | data
structure    | formatter |----------->| format |
+-----------+-------------------+-----------+            +--------+

References and Footnotes

    [1] http://python.sf.net/peps/pep-0216.html

    [2] http://www.literateprogramming.com/

    [3] http://www.lemburg.com/files/python/SoftwareDescriptions.html#doc.py

    [4] http://starship.python.net/crew/danilo/pythondoc/

    [5] http://happydoc.sf.net/

    [6] http://www.btinternet.com/~tratt/comp/python/crystal/index.html

    [7] http://www.lfw.org/python/

    [8] http://homepage.ntlworld.com/tibsnjoan/docutils/

    [9] http://www.python.org/sigs/doc-sig/

    [10] http://www.bsdi.com/setext/

    [11] http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage/

Local Variables:
mode: indented-text
indent-tabs-mode: nil