PEP 256: Docstring Processing System Framework
David Goodger
dgoodger at bigfoot.com
Wed Jun 13 00:37:54 EDT 2001
I am posting this PEP to comp.lang.python for greatest community exposure.
Please direct replies to the Python Documentation SIG's mailing list:
mailto:doc-sig at python.org.
There's a wide ASCII diagram near the end, which will probably be folded
beyond recognition. Please downloads from one of the following sources for a
clean view.
In addition to the master copy at http://python.sf.net/peps/pep-0256.txt
(HTML at http://python.sf.net/peps/pep-0256.html), a working copy is kept at
the project web site, http://docstring.sf.net/.
--
David Goodger dgoodger at bigfoot.com Open-source projects:
- Python Docstring Processing System: http://docstring.sf.net
- reStructuredText: http://structuredtext.sf.net
- The Go Tools Project: http://gotools.sf.net
PEP: 256
Title: Docstring Processing System Framework
Version: $Revision: 1.1 $
Last-Modified: $Date: 1935/06/06 05:55:51 $
Author: dgoodger at bigfoot.com (David Goodger)
Discussions-To: doc-sig at python.org
Status: Draft
Type: Standards Track
Requires: PEP 257 Docstring Conventions
PEP 258 DPS Generic Implementation Details
Created: 01-Jun-2001
Post-History:
Abstract
Python modules, classes and functions have a string attribute
called __doc__. If the first expression inside the definition is
a literal string, that string is assigned to the __doc__
attribute, called a documentation string or docstring. It is
often used to summarize the interface of the module, class or
function.
There is no standard format (markup) for docstrings, nor are there
standard tools for extracting docstrings and transforming them
into useful structured formats (e.g., HTML, DocBook, TeX). Those
tools that do exist are for the most part unmaintained and unused.
The issues surrounding docstring processing have been contentious
and difficult to resolve.
This PEP proposes a Docstring Processing System (DPS) framework.
It separates out the components (program and conceptual), enabling
the resolution of individual issues either through consensus (one
solution) or through divergence (many). It promotes standard
interfaces which will allow a variety of plug-in components (e.g.,
input parsers and output formatters) to be used.
This PEP presents the concepts of a DPS framework independently of
implementation details.
Rationale
Python lends itself to inline documentation. With its built-in
docstring syntax, a limited form of Literate Programming [2] is
easy to do in Python. However, there are no satisfactory standard
tools for extracting and processing Python docstrings. The lack
of a standard toolset is a significant gap in Python's
infrastructure; this PEP aims to fill the gap.
There are standard inline documentation systems for some other
languages. For example, Perl has POD (plain old documentation)
and Java has Javadoc, but neither of these mesh with the Pythonic
way. POD is very explicit, but takes after Perl in terms of
readability. Javadoc is HTML-centric; except for '@field' tags,
raw HTML is used for markup. There are also general tools such as
Autoduck and Web (Tangle & Weave), useful for multiple languages.
There have been many attempts to write autodocumentation systems
for Python (not an exhaustive list):
- Marc-Andre Lemburg's doc.py [3]
- Daniel Larsson's pythondoc & gendoc [4]
- Doug Hellmann's HappyDoc [5]
- Laurence Tratt's Crystal [6]
- Ka-Ping Yee's htmldoc & pydoc [7] (pydoc.py is now part of the Python
standard library; see below)
- Tony Ibbs' docutils [8]
These systems, each with different goals, have had varying degrees
of success. A problem with many of the above systems was
over-ambition. They provided a self-contained set of components: a
docstring extraction system, an input parser, an internal
processing system and one or more output formatters. Inevitably,
one or more components had serious shortcomings, preventing the
system from being adopted as a standard tool.
Throughout the existence of the Python Documentation Special
Interest Group (Doc-SIG) [9], consensus on a single standard
docstring format has never been reached. A lightweight, implicit
markup has been sought, for the following reasons (among others):
1. Docstrings written within Python code are available from within
the interactive interpreter, and can be 'print'ed. Thus the
use of plaintext for easy readability.
2. Programmers want to add structure to their docstrings, without
sacrificing raw docstring readability. Unadorned plaintext
cannot be transformed ('up-translated') into useful structured
formats.
3. Explicit markup (like XML or TeX) has been widely considered
unreadable by the uninitiated.
4. Implicit markup is aesthetically compatible with the clean and
minimalist Python syntax.
Early on, variants of Setext (Structure Enhanced Text) [10],
including Digital Creation's StructuredText [11], were proposed
for Python docstring formatting. Hereafter we will collectively
call these variants 'STexts'. Although used by some (including in
most of the above-listed autodocumentation tools), these markup
schemes have failed to become standard because:
- STexts have been incomplete: lacking 'essential' constructs that
people want to use in their docstrings, STexts are rendered less
than ideal. Note that these 'essential' constructs are not
universal; everyone has their own requirements.
- STexts have been sometimes surprising: bits of text are marked
up unexpectedly, leading to user frustration.
- SText implementations have been buggy.
- Some STexts have have had no formal specification except for the
implementation itself. A buggy implementation meant a buggy
spec, and vice-versa.
- There has been no mechanism to get around the SText markup rules
when a markup character is used in a non-markup context.
Recognizing the deficiencies of STexts, some people have proposed
using explicit markup of some kind. There have been proposals for
using XML, HTML, TeX, POD, and Javadoc at one time or another.
Proponents of STexts have vigorously opposed these proposals, and
the debates have continued off and on for at least five years.
It has become clear (to this author, at least) that the "all or
nothing" approach cannot succeed, since no all-encompassing
proposal could possibly be agreed upon by all interested parties.
A modular component approach, where components may be multiply
implemented, is the only chance at success. By separating out the
issues, we can form consensus more easily (smaller fights ;-), and
accept divergence more readily.
Each of the components of a docstring processing system should be
developed independently. A 'best of breed' system should be
chosen and/or developed and eventually included in Python's
standard library.
Pydoc & Other Existing Systems
Pydoc is part of the Python 2.1 standard library. It extracts and
displays docstrings from within the Python interactive
interpreter, from the shell command line, and from a GUI window
into a web browser (HTML). In the case of GUI/HTML, except for
some heuristic hyperlinking of identifier names, no formatting of
the docstrings is done. They are presented within <p><small><tt>
tags to avoid unwanted line wrapping. Unfortunately, the result
is not pretty.
The functionality proposed in this PEP could be added to or used
by pydoc when serving HTML pages. However, the proposed docstring
processing system's functionality is much more than pydoc needs
(in its current form). Either an independent tool will be
developed (which pydoc may or may not use), or pydoc could be
expanded to encompass this functionality and *become* the
docstring processing system (or one such system). That decision
is beyond the scope of this PEP.
Similarly for other existing docstring processing systems, their
authors may or may not choose compatibility with this framework.
However, if this framework is accepted and adopted as the Python
standard, compatibility will become an important consideration in
these systems' future.
Specification
The docstring processing system framework consists of components,
as follows::
1. Docstring conventions. Documents issues such as:
- What should be documented where.
- First line is a one-line synopsis.
PEP 257, "Docstring Conventions" [12], documents these issues.
2. Docstring processing system generic implementation details.
Documents issues such as:
- High-level spec: what a DPS does.
- Command-line interface for executable script.
- System Python API
- Docstring extraction rules.
- Input parser API.
- Intermediate internal data structure: output from input parser,
input to output formatter.
- Output formatter API.
- Output management.
These issues are applicable to any docstring processing system
implementation. PEP 258, "DPS Generic Implementation Details"
[13], documents these issues.
3. Docstring processing system implementation.
4. Input markup specifications: docstring syntax.
5. Input parser implementations.
6. Output formats (HTML, XML, TeX, DocBook, info, etc.).
7. Output formatter implementations.
Components 1, 2, and 3 will be the subject of individual companion
PEPs, although they may be merged into this PEP once consensus is
reached. If there is only one implementation, PEPs for components
2 & 3 can be combined. Multiple PEPs will be necessary for each
of components 4, 5, 6, and 7. An alternative to the PEP mechanism
may be used instead, since these are not directly related to the
Python language.
The following diagram shows an overview of the framework.
Interfaces are indicated by double-borders. The ASCII diagram is
very wide; please turn off line wrapping to view it:
+========================+
| Command-Line Interface
|
+========================+
| Executable Script
|
+------------------------+
|
| calls
v
+===========================================+ returns +---------+
| System Python API
|==========>| output |
+--------+
+===========================================+ | objects |
_ writes | Python | reads | Docstring Processing
System | +---------+
/ \ ==============>| module |<===========|
|
\_/ +--------+ | input |
transformation, | output | +--------+
| +-------------+ follows | docstring |
integration, | object | writes | output |
--+-- consults | docstring |<-----------| extraction | linking
| management |===========>| files |
| --------->| conventions |
+============+=====+=====+=====+============+ +--------+
/ \ +-------------+ | parser API |
| formatter API |
/ \ +-------------+ +===========+======+
+======+===========+ +--------+
author consults | markup | implements | input |
intermediate | output | implements | output |
--------->| syntax spec |<-----------| parser | data
structure | formatter |----------->| format |
+-------------+
+-----------+-------------------+-----------+ +--------+
Project Web Site
A SourceForge project has been set up for this work at
http://docstring.sf.net.
References and Footnotes
[1] http://python.sf.net/peps/pep-0216.html
[2] http://www.literateprogramming.com/
[3] http://www.lemburg.com/files/python/SoftwareDescriptions.html#doc.py
[4] http://starship.python.net/crew/danilo/pythondoc/
[5] http://happydoc.sf.net/
[6] http://www.btinternet.com/~tratt/comp/python/crystal/index.html
[7] http://www.lfw.org/python/
[8] http://homepage.ntlworld.com/tibsnjoan/docutils/
[9] http://www.python.org/sigs/doc-sig/
[10] http://www.bsdi.com/setext/
[11] http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage/
[12] http://python.sf.net/peps/pep-0257.html
[13] http://python.sf.net/peps/pep-0258.html
Copyright
This document has been placed in the public domain.
Acknowledgements
This document borrows text from PEP 216 "Docstring Format" by
Moshe Zadka [1]. It is intended as a reorganization of PEP 216
and its approach.
This document also borrows ideas from the archives of the Python
Doc-SIG. Thanks to all members past & present.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End:
More information about the Python-list
mailing list