PEP 256: Docstring Processing System Framework

Wed Jun 13 00:37:54 EDT 2001

I am posting this PEP to comp.lang.python for greatest community exposure.
Please direct replies to the Python Documentation SIG's mailing list:
mailto:doc-sig at python.org.

There's a wide ASCII diagram near the end, which will probably be folded
beyond recognition. Please downloads from one of the following sources for a
clean view.

In addition to the master copy at http://python.sf.net/peps/pep-0256.txt
(HTML at http://python.sf.net/peps/pep-0256.html), a working copy is kept at
the project web site, http://docstring.sf.net/.

-- 
David Goodger    dgoodger at bigfoot.com    Open-source projects:
 - Python Docstring Processing System: http://docstring.sf.net
 - reStructuredText: http://structuredtext.sf.net
 - The Go Tools Project: http://gotools.sf.net

PEP: 256
Title: Docstring Processing System Framework
Version: $Revision: 1.1 $
Last-Modified: $Date: 1935/06/06 05:55:51 $
Author: dgoodger at bigfoot.com (David Goodger)
Discussions-To: doc-sig at python.org
Status: Draft
Type: Standards Track
Requires: PEP 257 Docstring Conventions
          PEP 258 DPS Generic Implementation Details
Created: 01-Jun-2001
Post-History:

Abstract

    Python modules, classes and functions have a string attribute
    called __doc__.  If the first expression inside the definition is
    a literal string, that string is assigned to the __doc__
    attribute, called a documentation string or docstring.  It is
    often used to summarize the interface of the module, class or
    function.

    There is no standard format (markup) for docstrings, nor are there
    standard tools for extracting docstrings and transforming them
    into useful structured formats (e.g., HTML, DocBook, TeX).  Those
    tools that do exist are for the most part unmaintained and unused.
    The issues surrounding docstring processing have been contentious
    and difficult to resolve.

    This PEP proposes a Docstring Processing System (DPS) framework.
    It separates out the components (program and conceptual), enabling
    the resolution of individual issues either through consensus (one
    solution) or through divergence (many).  It promotes standard
    interfaces which will allow a variety of plug-in components (e.g.,
    input parsers and output formatters) to be used.

    This PEP presents the concepts of a DPS framework independently of
    implementation details.

Rationale

    Python lends itself to inline documentation.  With its built-in
    docstring syntax, a limited form of Literate Programming [2] is
    easy to do in Python.  However, there are no satisfactory standard
    tools for extracting and processing Python docstrings.  The lack
    of a standard toolset is a significant gap in Python's
    infrastructure; this PEP aims to fill the gap.

    There are standard inline documentation systems for some other
    languages.  For example, Perl has POD (plain old documentation)
    and Java has Javadoc, but neither of these mesh with the Pythonic
    way.  POD is very explicit, but takes after Perl in terms of
    readability.  Javadoc is HTML-centric; except for '@field' tags,
    raw HTML is used for markup.  There are also general tools such as
    Autoduck and Web (Tangle & Weave), useful for multiple languages.

    There have been many attempts to write autodocumentation systems
    for Python (not an exhaustive list):

    - Marc-Andre Lemburg's doc.py [3]

    - Daniel Larsson's pythondoc & gendoc [4]

    - Doug Hellmann's HappyDoc [5]

    - Laurence Tratt's Crystal [6]

    - Ka-Ping Yee's htmldoc & pydoc [7] (pydoc.py is now part of the Python
      standard library; see below)

    - Tony Ibbs' docutils [8]

    These systems, each with different goals, have had varying degrees
    of success.  A problem with many of the above systems was
    over-ambition.  They provided a self-contained set of components: a
    docstring extraction system, an input parser, an internal
    processing system and one or more output formatters.  Inevitably,
    one or more components had serious shortcomings, preventing the
    system from being adopted as a standard tool.

    Throughout the existence of the Python Documentation Special
    Interest Group (Doc-SIG) [9], consensus on a single standard
    docstring format has never been reached.  A lightweight, implicit
    markup has been sought, for the following reasons (among others):

    1. Docstrings written within Python code are available from within
       the interactive interpreter, and can be 'print'ed.  Thus the
       use of plaintext for easy readability.

    2. Programmers want to add structure to their docstrings, without
       sacrificing raw docstring readability.  Unadorned plaintext
       cannot be transformed ('up-translated') into useful structured
       formats.

    3. Explicit markup (like XML or TeX) has been widely considered
       unreadable by the uninitiated.

    4. Implicit markup is aesthetically compatible with the clean and
       minimalist Python syntax.

    Early on, variants of Setext (Structure Enhanced Text) [10],
    including Digital Creation's StructuredText [11], were proposed
    for Python docstring formatting.  Hereafter we will collectively
    call these variants 'STexts'.  Although used by some (including in
    most of the above-listed autodocumentation tools), these markup
    schemes have failed to become standard because:

    - STexts have been incomplete: lacking 'essential' constructs that
      people want to use in their docstrings, STexts are rendered less
      than ideal.  Note that these 'essential' constructs are not
      universal; everyone has their own requirements.

    - STexts have been sometimes surprising: bits of text are marked
      up unexpectedly, leading to user frustration.

    - SText implementations have been buggy.

    - Some STexts have have had no formal specification except for the
      implementation itself.  A buggy implementation meant a buggy
      spec, and vice-versa.

    - There has been no mechanism to get around the SText markup rules
      when a markup character is used in a non-markup context.

    Recognizing the deficiencies of STexts, some people have proposed
    using explicit markup of some kind.  There have been proposals for
    using XML, HTML, TeX, POD, and Javadoc at one time or another.
    Proponents of STexts have vigorously opposed these proposals, and
    the debates have continued off and on for at least five years.

    It has become clear (to this author, at least) that the "all or
    nothing" approach cannot succeed, since no all-encompassing
    proposal could possibly be agreed upon by all interested parties.
    A modular component approach, where components may be multiply
    implemented, is the only chance at success.  By separating out the
    issues, we can form consensus more easily (smaller fights ;-), and
    accept divergence more readily.

    Each of the components of a docstring processing system should be
    developed independently.  A 'best of breed' system should be
    chosen and/or developed and eventually included in Python's
    standard library.

Pydoc & Other Existing Systems

    Pydoc is part of the Python 2.1 standard library.  It extracts and
    displays docstrings from within the Python interactive
    interpreter, from the shell command line, and from a GUI window
    into a web browser (HTML).  In the case of GUI/HTML, except for
    some heuristic hyperlinking of identifier names, no formatting of
    the docstrings is done.  They are presented within <p><small><tt>
    tags to avoid unwanted line wrapping.  Unfortunately, the result
    is not pretty.

    The functionality proposed in this PEP could be added to or used
    by pydoc when serving HTML pages.  However, the proposed docstring
    processing system's functionality is much more than pydoc needs
    (in its current form).  Either an independent tool will be
    developed (which pydoc may or may not use), or pydoc could be
    expanded to encompass this functionality and *become* the
    docstring processing system (or one such system).  That decision
    is beyond the scope of this PEP.

    Similarly for other existing docstring processing systems, their
    authors may or may not choose compatibility with this framework.
    However, if this framework is accepted and adopted as the Python
    standard, compatibility will become an important consideration in
    these systems' future.

Specification

    The docstring processing system framework consists of components,
    as follows::

    1. Docstring conventions.  Documents issues such as:

       - What should be documented where.

       - First line is a one-line synopsis.

       PEP 257, "Docstring Conventions" [12], documents these issues.

    2. Docstring processing system generic implementation details.
       Documents issues such as:

       - High-level spec: what a DPS does.

       - Command-line interface for executable script.

       - System Python API

       - Docstring extraction rules.

       - Input parser API.

       - Intermediate internal data structure: output from input parser,
         input to output formatter.

       - Output formatter API.

       - Output management.

       These issues are applicable to any docstring processing system
       implementation.  PEP 258, "DPS Generic Implementation Details"
       [13], documents these issues.

    3. Docstring processing system implementation.

    4. Input markup specifications: docstring syntax.

    5. Input parser implementations.

    6. Output formats (HTML, XML, TeX, DocBook, info, etc.).

    7. Output formatter implementations.

    Components 1, 2, and 3 will be the subject of individual companion
    PEPs, although they may be merged into this PEP once consensus is
    reached.  If there is only one implementation, PEPs for components
    2 & 3 can be combined.  Multiple PEPs will be necessary for each
    of components 4, 5, 6, and 7.  An alternative to the PEP mechanism
    may be used instead, since these are not directly related to the
    Python language.

    The following diagram shows an overview of the framework.
    Interfaces are indicated by double-borders.  The ASCII diagram is
    very wide; please turn off line wrapping to view it:

+========================+
                                                    | Command-Line Interface
|

+========================+
                                                    | Executable Script
|

+------------------------+
                                                                |
                                                                | calls
                                                                v

+===========================================+ returns   +---------+
                                                    | System Python API
|==========>| output  |
                              +--------+
+===========================================+           | objects |
           _    writes        | Python |      reads | Docstring Processing
System               |           +---------+
          / \  ==============>| module |<===========|
|
          \_/                 +--------+            | input      |
transformation, | output     |            +--------+
           |             +-------------+    follows | docstring  |
integration,    | object     | writes     | output |
         --+--  consults | docstring   |<-----------| extraction | linking
| management |===========>| files  |
           |   --------->| conventions |
+============+=====+=====+=====+============+            +--------+
          / \            +-------------+            | parser API       |
|    formatter API |
         /   \           +-------------+            +===========+======+
+======+===========+            +--------+
        author  consults | markup      | implements | input     |
intermediate      | output    | implements | output |
               --------->| syntax spec |<-----------| parser    | data
structure    | formatter |----------->| format |
                         +-------------+
+-----------+-------------------+-----------+            +--------+

Project Web Site

    A SourceForge project has been set up for this work at
    http://docstring.sf.net.

References and Footnotes

    [1] http://python.sf.net/peps/pep-0216.html

    [2] http://www.literateprogramming.com/

    [3] http://www.lemburg.com/files/python/SoftwareDescriptions.html#doc.py

    [4] http://starship.python.net/crew/danilo/pythondoc/

    [5] http://happydoc.sf.net/

    [6] http://www.btinternet.com/~tratt/comp/python/crystal/index.html

    [7] http://www.lfw.org/python/

    [8] http://homepage.ntlworld.com/tibsnjoan/docutils/

    [9] http://www.python.org/sigs/doc-sig/

    [10] http://www.bsdi.com/setext/

    [11] http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage/

    [12] http://python.sf.net/peps/pep-0257.html

    [13] http://python.sf.net/peps/pep-0258.html

Copyright

    This document has been placed in the public domain.

Acknowledgements

    This document borrows text from PEP 216 "Docstring Format" by
    Moshe Zadka [1].  It is intended as a reorganization of PEP 216
    and its approach.

    This document also borrows ideas from the archives of the Python
    Doc-SIG.  Thanks to all members past & present.

Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: