[Doc-SIG] A Plan for Structured Text

David Goodger dgoodger@bigfoot.com
Fri, 24 Nov 2000 22:57:10 -0500


All the activity over PEP 216 and StructuredTextNG has spurred me to
complete a project I left off many months ago: a plan, an analysis, and a
revised specification for Structured Text. You may be saying, "Not again!",
because this topic has come up many times in Doc-SIG and elsewhere. But I
believe in the idea, and I believe that it is the best hope for Python to
get its own superior equivalent to Perl's POD and Java's JavaDoc.

I am posting these documents to the Doc-SIG to ask for everyone's input. I
will soon put them (or revised versions after discussion) in a more
permanent home at http://structuredtext.sourceforge.net ('restructuredtext'
was too long for a SourceForge project name, unfortunately).

Please rest assured that this is not another argument without code to back
it up. I will commit to writing a new implementation, or assisting in
modifying an existing implementation. This is a serious proposal. I want to
see this get *done*!

-- 
David Goodger    dgoodger@bigfoot.com    Open-source projects:
 - The Go Tools Project: http://gotools.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net


============================
 A Plan for Structured Text
============================
David Goodger (mailto:dgoodger@bigfoot.com)
2000-11-24

Structured Text is useful for quickly creating simple web pages and for
in-line program documentation (in comments any programming language, or in
documentation strings in languages that support them, such as Python and
Emacs Lisp). Structured Text can be used in systems which extract in-line
comments and docstrings to create API documentation.

A Structured Text implementation should be designed with the goal of
enabling further extension and modification for specific application
domains, and be coded to be readable and understandable. The following block
diagram sketches out a hypothetical 'DocXtor' system, showing how Structured
Text fits in::

    +-----------------------------------------------------------------+
    |             DocXtor: Python Documentation Extractor             |
    |                                                                 |
    |--------------------------------+                                |
    |        Python-specific         |                                |
    |   Structured Text extensions   |          +---------------------|
    |--------------------------------+-------+  |   Python language   |
    |            reStructuredText:           |  |       services      |
    |    Revised Structured Text processor   |  |  (parser.py, etc.)  |
    +----------------------------------------+--+---------------------+

History
=======
StructuredText.py was developed by Digital Creations
(http://www.digicool.com) and first released in 1996. It is now released as
a part of the open-source 'Z Object Publishing Environment' (ZOPE,
http://www.zope.org). Structured Text itself is based on the earlier Setext
specification (http://www.bsdi.com/setext).

I discovered Structured Text while searching for a solution to my need to
document the Python modules in my SGF parser & summarizer project (see
http://gotools.sourceforge.net). Version 1.1 of StructuredText was included
in the 'pythondoc' project
(http://starship.python.net/crew/danilo/pythondoc/). Although I have yet to
get pythondoc to work for me, I found Structured Text to be almost ideal for
my needs. I joined the Python Doc-SIG (Documentation Special Interest Group,
http://www.python.org/sigs/doc-sig/) mailing list and found an ongoing
discussion of the shortcomings of the Structured Text 'standard'.

I decided to modify the original module with my own extensions and some
suggested by the Doc-SIG members. I soon realized that the module was not
written with extension in mind, so I embarked upon a general reworking,
including adapting it to the 're' regular expression module (I was more used
to 're', it was more powerful, and the regular expressions in 'regex' and
'regsub' were nearly unintelligible with their excess of backslashes). Soon
after I completed the modifications, I discovered that StructuredText.py was
up to version 1.23 in the ZOPE distribution. Implementing the new syntax
extensions from version 1.23 proved to be an exercise in frustration, as the
complexity of the module had become overwhelming.

I decided that a complete rewrite was in order, and even started a
SourceForge project, reStructuredText_. Unfortunately I was sidetracked (or,
if you ask my wife: fortunately I became employed) and stopped working on
this project. Recently, development on StructuredTextNG (Next Generation)
has begun at Digital Creations. It seems to have many improvements, but
still suffers from many of the problems of classic StructuredText (ST-TOS,
StructuredText: The Original Spec?).

.. _reStructuredText: http://structuredtext.sourceforge.net

Thus I recently made the time to enumerate the problems and possible
solutions, and complete the first draft of a revised Structured Text
specification. My motivations are as follows:

- I need a standard format for inline documentation of the programs I write.
  I believe many others have the same need. Structured Text could be that
  format.

- I believe in the Structured Text idea and want to help formalize the
  standard. However, I feel it has flaws that desperately need fixing.

- Perl has POD, Java has JavaDoc. Neither of these mesh with the Pythonic
  worldview. Structured Text could form the foundation for a documentation
  extraction system (cool name: DocXtor, the Python Documentation Extractor)
  that Python needs and could greatly benefit from. There have been many
  attempts to write such a system, with varying success. A 'best of breed'
  system should be chosen and/or developed and included in Python's standard
  library.

- Structured Text is only a foundation. It should not aspire to be the
  entire system. In fact, for Python docstring extraction, the Structured
  Text syntax would probably have to be extended to allow for higher-level
  semantic constructs (keyword-tagged values and the like). I don't want
  Structured Text or the hypothetical Python DocXtor to die because of
  overcomplication.

- Most of all, I want to help ease the documentation chore, the bane of many
  a programmer.

Structured Text Goals
=====================
1. To allow people to create richly structured documents using an ordinary
   text editor, without having to think about the markup.

2. The markup shall be intuitive, minimal, and unobtrusive.

   A. Intuitive -- Almost any plaintext document should be valid structured
      text. Structured text takes its cues from the kind of ad-hoc markup
      used in plaintext email messages and newsgroup postings. Wherever
      possible, structure should be inferred from such naturally occurring
      markup.

   B. Minimal -- The markup that is used must be as sparse as possible.
      HTML/XML-type tags are too cumbersome for manually edited, often
      embedded, multipurpose texts.

   C. Unobtrusive -- A document in structured text format shall read equally
      well in raw form as in processed form.

         Please keep in mind a basic goal of structured text, which is to
         keep the raw text as readable as possible. **If you don't buy into
         this idea, you're probably wasting your time.**

          
(http://dev.zope.org/Members/jim/StructuredTextWiki/DocumentationStrings)

3. Multiple output formats shall be possible.
   - internal DOM-based data structures
   - XML/SGML: multiple DTDs
   - HTML
   - TeX/LaTeX
   - Structured Text
   - others (extensible)

4. The markup shall be modular and extensible for use in specific
   application domains.

5. The implementation shall be well documented, both internally (doc strings
   and comments) and externally (users manual, API guide, tutorial).

StructuredTextZen
-----------------
(from http://dev.zope.org/Members/jim/StructuredTextWiki/StructuredTextZen)

What attracted me to StructuredText in the first place was the fact that
it's the only structured document format that I could UseWithoutThinking. No
matter how much of a DocBook wizard or HTML wizard you are, you still have
to constantly think about elements, entities, and what not. By contrast, you
can largely use StructuredText... and concentrate on your text rather than
your formatting.

Obviously this makes it very easy to use for experienced authors, but it
also has tremendous benefits for folks who want to use it as part of a
content delegation solution. StructuredText can be picked up in a very short
time by non-techies and is probably the only solution available where people
who don't necessarily think about document structure can author documents
that are semantically parsable...