[Doc-SIG] PEP: Docstring Processing System Framework
Sun, 03 Jun 2001 10:30:08 -0400
After much thought and reorganization*, I am pleased to offer this candidate
PEP. Two related candidate PEPs follow, all part of the Docstring Processing
System (DPS) project. I will be seeking PEP numbers ASAP; once obtained, I
will post them to comp.lang.python.
The website for this project is http://docstring.sf.net. The three PEPs
posted here, plus supporting XML DTDs, are available for individual browsing
there. I've released version 0.1 of the project, which contains all
specification files as well as the code. Apart from one module
(dps.statemachine, useful for line-based parsing using regular-expressions),
there is only a skeleton of the core system. I welcome any input.
Several similar projects already exist. I invite their authors to take a
look at the approach presented in this PEP and to consider consolidating our
efforts. I will be happy to add developers and project admins to the
SourceForge project; please let me know if you are interested. I would like
this to be an open, community project!
I've simultaneously released the reStructuredText project, an input parser
component for the DPS, at http://structuredtext.sf.net. I'll be posting the
updated specification to Doc-SIG shortly. My hope is that these projects
will form the foundation for a standard documentation tool for Python.
* See the "History" section of
David Goodger email@example.com Open-source projects:
- Python Docstring Processing System: http://docstring.sf.net
- reStructuredText: http://structuredtext.sf.net
- The Go Tools Project: http://gotools.sf.net
Title: Docstring Processing System Framework
Author: firstname.lastname@example.org (David Goodger)
Type: Standards Track
Requires: (Docstring Conventions PEP),
(DPS Generic Implementation Details PEP)
Python modules, classes and functions have a string attribute called
__doc__. If the first expression inside the definition is a literal
string, that string is assigned to the __doc__ attribute, called a
documentation string or docstring. It is often used to summarize the
interface of the module, class or function.
There is no standard format (markup) for docstrings, nor are there
standard tools for extracting docstrings and transforming them into
useful structured formats (e.g., HTML, DocBook, TeX). Those tools that
do exist are for the most part unmaintained and unused. The issues
surrounding docstring processing have been contentious and difficult to
This PEP proposes a Docstring Processing System (DPS) framework. It
separates out the components (program and conceptual), enabling the
resolution of individual issues either through consensus (one solution)
or through divergence (many). It promotes standard interfaces which
will allow a variety of plug-in components (e.g., input parsers and
output formatters) to be used.
This PEP presents the concepts of a DPS framework independently of
This document has been placed in the public domain.
This document borrows text from PEP 216 "Docstring Format" by Moshe
Zadka . It is intended as a reorganization of PEP 216 and its
This document also borrows ideas from the archives of the Python
Doc-SIG. Thanks to all members past & present.
A SourceForge project has been set up for this work at
Python lends itself to inline documentation. With its built-in
docstring syntax, a limited form of Literate Programming  is easy to
do in Python. However, there are no satisfactory standard tools for
extracting and processing Python docstrings. The lack of a standard
toolset is a significant gap in Python's infrastructure; this PEP aims
to fill the gap.
There are standard inline documentation systems for some other
languages. For example, Perl has POD (plain old documentation) and Java
has Javadoc, but neither of these mesh with the Pythonic way. POD is
very explicit, but takes after Perl in terms of readability. Javadoc is
HTML-centric; except for '@field' tags, raw HTML is used for markup.
There are also general tools such as Autoduck and Web (Tangle & Weave),
useful for multiple languages.
There have been many attempts to write autodocumentation systems for
Python (not an exhaustive list):
- Marc-Andre Lemburg's doc.py 
- Daniel Larsson's pythondoc & gendoc 
- Doug Hellmann's HappyDoc 
- Laurence Tratt's Crystal 
- Ka-Ping Yee's htmldoc & pydoc  (pydoc.py is now part of the Python
standard library; see below)
- Tony Ibbs' docutils 
These systems, each with different goals, have had varying degrees of
success. A problem with many of the above systems was overambition.
They provided a self-contained set of components: a docstring
extraction system, an input parser, an internal processing system and
one or more output formatters. Inevitably, one or more components had
serious shortcomings, preventing the system from being adopted as a
Throughout the existence of the Python Documentation Special Interest
Group (Doc-SIG) , consensus on a single standard docstring format
has never been reached. A lightweight, implicit markup has been sought,
for the following reasons (among others):
1. Docstrings written within Python code are available from within the
interactive interpreter, and can be 'print'ed. Thus the use of
plaintext for easy readability.
2. Programmers want to add structure to their docstrings, without
sacrificing raw docstring readability. Unadorned plaintext cannot be
transformed ('up-translated') into useful structured formats.
3. Explicit markup (like XML or TeX) has been widely considered
unreadable by the uninitiated.
4. Implicit markup is aesthetically compatibile with the clean and
minimalist Python syntax.
Early on, variants of Setext (Structure Enhanced Text) , including
Digital Creation's StructuredText , were proposed for Python
docstring formatting. Hereafter we will collectively call these
variants 'STexts'. Although used by some (including in most of the
above-listed autodocumentation tools), these markup schemes have failed
to become standard because:
- STexts have been incomplete: lacking 'essential' constructs that
people want to use in their docstrings, STexts are rendered less than
ideal. Note that these 'essential' constructs are not universal;
everyone has their own requirements.
- STexts have been sometimes surprising: bits of text are marked up
unexpectedly, leading to user frustration.
- SText implementations have been buggy.
- Some STexts have have had no formal specification except for the
implementation itself. A buggy implementation meant a buggy spec, and
- There has been no mechanism to get around the SText markup rules when
a markup character is used in a non-markup context.
Recognizing the deficiencies of STexts, some people have proposed using
explicit markup of some kind. There have been proposals for using XML,
HTML, TeX, POD, and Javadoc at one time or another. Proponents of
STexts have vigorously opposed these proposals, and the debates have
continued off and on for at least five years.
It has become clear (to this author, at least) that the "all or
nothing" approach cannot succeed, since no all-encompassing proposal
could possibly be agreed upon by all interested parties. A modular
component approach, where components may be multiply implemented, is
the only chance at success. By separating out the issues, we can form
consensus more easily (smaller fights ;-), and accept divergence more
Each of the components of a docstring processing system should be
developed independently. A 'best of breed' system should be chosen
and/or developed and eventually included in Python's standard library.
Pydoc & Other Existing Systems
Pydoc is part of the Python 2.1 standard library. It extracts and
displays docstrings from within the Python interactive interpreter,
from the shell command line, and from a GUI window into a web browser
(HTML). In the case of GUI/HTML, except for some heuristic hyperlinking
of identifier names, no formatting of the docstrings is done. They are
presented within <p><small><tt> tags to avoid unwanted line wrapping.
Unfortunately, the result is not pretty.
The functionality proposed in this PEP could be added to or used by
pydoc when serving HTML pages. However, the proposed docstring
processing system's functionality is much more than pydoc needs (in its
current form). Either an independent tool will be developed (which
pydoc may or may not use), or pydoc could be expanded to encompass this
functionality and *become* the docstring processing system (or one such
system). That decision is beyond the scope of this PEP.
Similarly for other existing docstring processing systems, their
authors may or may not choose compatibility with this framework.
However, if this framework is accepted and adopted as the Python
standard, compatibility will become an important consideration in these
The docstring processing system framework consists of components, as
1. Docstring conventions. Documents issues such as:
- What should be documented where.
- First line is a one-line synopsis.
2. Docstring processing system generic implementation details.
Documents issues such as:
- High-level spec: what a DPS does.
- Command-line interface for executable script.
- System Python API
- Docsring extraction rules.
- Input parser API.
- Intermediate internal data structure: output from input parser,
input to output formatter.
- Output formatter API.
- Output management.
These issues are applicable to any docstring processing system
3. Docstring processing system implementation.
4. Input markup specificiations: docstring syntax.
5. Input parser implementations.
6. Output formats (HTML, XML, TeX, DocBook, info, etc.).
7. Output formatter implementations.
Components 1, 2, and 3 will be the subject of individual companion
PEPs, although they may be merged into this PEP once consensus is
reached. If there is only one implementation, PEPs for components 2 & 3
can be combined. Multiple PEPs will be necessary for each of components
4, 5, 6, and 7. An alternative to the PEP mechanism may be used
instead, since these are not directly related to the Python language.
The following diagram shows an overview of the framework (very wide--
apologies for line wrapping; interfaces are indicated by
| Command-Line Interface
| Executable Script
+===========================================+ returns +---------+
| System Python API
|==========>| output |
+===========================================+ | objects |
_ writes | Python | reads | Docstring Processing
System | +---------+
/ \ ==============>| module |<===========|
\_/ +--------+ | input |
transformation, | output | +--------+
| +-------------+ follows | docstring |
integration, | object | writes | output |
--+-- consults | docsring |<-----------| extraction | linking
| management |===========>| files |
| --------->| conventions |
/ \ +-------------+ | parser API |
| formatter API |
/ \ +-------------+ +===========+======+
author consults | markup | implements | input |
intermediate | output | implements | output |
--------->| syntax spec |<-----------| parser | data
structure | formatter |----------->| format |
References and Footnotes