[Doc-SIG] reStructuredText: Revised Structured Text Specification

David Goodger dgoodger@bigfoot.com
Fri, 24 Nov 2000 23:15:02 -0500

 reStructuredText: Revised Structured Text Specification
David Goodger (mailto:dgoodger@bigfoot.com)

This revised specification is an attempt to refine, standardize, and extend
the original Structured Text from Digital Creations' Zope

Structured Text is plain text (i.e., text without tags, control characters,
or other embedded formatting information) that uses simple, intuitive, and
language-independent constructs to indicate the structure of a document.
These constructs are equally easy to read in raw and processed forms. This
document is itself an example of Structured Text (raw, if you are reading
the text file, or processed, if you are reading an HTML page, for example).

Simple symbology is used to indicate special constructs, such as headings,
bullet lists, and emphasis. The symbology used is as minimal and unobtrusive
as possible. Less often-used constructs and extensions to the basic
structured text format may have more elaborate markup.

A Structured Text document is made up of body elements, and optionally
structured into sections. Sections contain body elements and/or subsections.
Body elements consist of:

- paragraphs, which contain text and optional inline markup;
- lists (enumerated, bullet, descriptive, and option), which contain list
  items, whose items in turn contain body elements;
- code blocks, which contain preformatted text only (spaces and linebreaks
  are preserved);
- block quotes, which contain body elements; and
- tables, whose cells contain body elements.

Blank lines are used to separate paragraphs and other elements. Blank lines
may be omitted when the markup makes element separation unambiguous. Tabs
will be replaced by spaces; tab stops are at every 8th column. Indentation
is used to indicate, and is only significant in indicating:

- nesting within list items, such as nested lists, or multiple paragraphs
  within a list item,
- block quotes, and
- the extent of code blocks.

Paragraphs may contain inline markup. Inline markup may not be nested.

- inline code
- strong
- emphasis
- hyperlinks:
  - standalone (absolute URLs)
  - indirect (absolute and relative URLs)
  - internal (cross-links within a document)
  - footnotes

Below is a block diagram overview of the hierarchy of element types in
Structured Text. Elements 'may contain' other elements below them. Element
types in parentheses indicate recursive relationships: sections may contain
(sub)sections, tables contain further body elements, etc. Footnotes,
comments, directives, and hyperlink targets (all starting with '.. ' in
column 1) are independent of the hierarchy and may appear at any point. ::

    +----------------------------------------+               |  comments,  |
    |                             +-------+  |               | directives, |
    |  sections  (begins with one | title |) |   +-----------|  hyperlink  |
    |                             +-------+  |   | footnotes |   targets   |
    | (sections) |              body elements:               |    text     |
    +------------|  code  |       | block  |        |  para- |    block    |
                 | blocks | lists | quotes | tables | graphs |-------------+
                          |     (body elements)     | inline |
                          +-------------------------| markup |

Syntax Details

Escaping Mechanism
The character set available in plain text documents is limited. Every
non-alphanumeric character has been overloaded with functionality: ordinary
written text, mathematics, computer programming, regular expressions,
Internet conventions. No matter what characters are used for markup, they
will already have multiple meanings in written text. Therefore they *will*
appear in text **without being intended as markup**.

A serious markup system requires an escaping mechanism to override the
default meaning of the characters used for the markup. In Structured Text,
we will use the (almost) universal escaping character, the backslash.

A backslash followed by any character escapes the character. The escaped
character represents the character itself, and is prevented from playing a
role in any markup interpretation. The backslash is removed from the output.
A literal backslash is represented by two backslashes in a row.

Comments and Directives
A comment/directive block is a text block:
- whose first line begins with '.. ' in column 1,
- whose second and subsequent lines are indented relative to the first, and
- which ends with a blank or unindented line.

This syntax is used for comments, footnotes, indirect hyperlinks, internal
hyperlinks, directives, and as an extension mechanism. Footnotes and
hyperlinks are described in the section 'Hyperlinks' below.

Arbitrary text may follow the comment start and will be removed from the
processed output. The only restriction on comments is that they not use the
same syntax as directives or hyperlinks. It is recommended to put a blank
line after a comment, to ensure that subsequent indented text blocks are not
accidentally commented out.

Directives are indicated by a text block beginning with '.. ', followed by a
single word (the directive name, [a-zA-Z][a-zA-Z0-9_-]*), two colons, and
whitespace. (Two colons are used to avoid clashes with common comment text
like '.. Warning: modify at your own risk!'.) Directive names are
case-insensitive. Actions taken in response to directives and the
interpretation of data in the directive block or subsequent text block(s)
are directive- and implementation-dependent.

No directives are defined by the core Structured Text specification.

Directives can be used as an extension mechanism for Structured Text. For
example, a proposal was made in the Python Doc-SIG for keyword-tagged
values. This could be accomplished as follows::

    .. keywords::
    Author: Anne Elk (Miss)
    Revision: 1

If an implementation of Structured Text doesn't recognize a directive, the
entire directive block will simply be treated as a comment. Any subsequent
text blocks will be processed as usual. The implentation may also emit a

Section Structure
Sections are identified through their titles. Titles are marked up with
'underlines' below the title text (and, in some cases, 'overlines' above the
title). An underline/overline is a line of non-alphanumeric characters that
begins in column 1 and extends at least as far as the title text. In the
case of both overlines and underlines, their lengths and characters must
match. There may be any number of levels of section titles.

Rather than imposing a fixed number and order of section title styles, the
order enforced will be the order as encountered. The first style encountered
will be an outermost title (like HTML H1), the second style will be a
subtitle, the third will be a subsubtitle, and so on.

Below are examples of section titles. The first five styles are

     Section Title

    Section Title

    Section Title

    Section Title

    Section Title

    Section Title

    Section Title

    Section Title

    Section Title

Note that the first example title above (overline & underline of '=') is
slightly inset, but it doesn't have to be; this is merely aesthetic and not

A blank line after a title is optional. All text blocks up to the next title
are included in a section (or subsection, etc.).

All section/title types need not be used, nor must any specific
section/title type be used. However, a document must be consistent in its
use of sections/titles: once established, section title types must be used
in the outer-to-inner order.

Body Elements
Code Blocks
A paragraph which which ends with two colons ('::') signifies that all
following **indented** text blocks are code blocks. No further markup
processing is done within a code block. It is left as-is, and typically
rendered in a monospaced font::

    This is a typical paragraph. A code block follows::

        for a in [5,4,3,2,1]:   # this is some program code, formatted as-is
            print a
        print "it's..."
        # a code block continues until the indentation ends

    This text has returned to the indentation of the first paragraph, is
    outside of the code block, and therefore treated as an ordinary

When '::' is immediately preceeded by whitespace, both colons will be
removed from the output. When text immediately preceeds the '::', *one*
colon will be removed from the output, leaving only one (i.e., '::' will be
replaced by ':'). When '::' is alone on a line, it will be completely
removed from the output; no empty paragraph will remain.

The minimum leading whitespace will be removed from the code block. In the
example code block above, only the second line ('`    print a`') will keep
its leading whitespace.

Block Quotes
A text block that is indented relative to the preceeding text, without
markup indicating it to be a code block, is a block quote. All markup
processing (for body elements and inline markup) continues within the block

    This is an ordinary paragraph, introducing a quote:

        "It is my business to know things. That is my trade."

        --Sir Arthur Conan Doyle

Bullet Lists
A text block which begins with a '-', '*', or '+', followed by whitespace,
is treated as a bullet list (unordered list) item. For example::

    - This is the first bullet list item.

    - This is the first paragraph in the second item in the list.

      This is the second paragraph in the second item in the list.
      The blank line above this paragraph is required.

      - This is a sublist. A code block needs to be indented even more::

            print "lemon curry?"

    - This is the third item of the main list.
    - This is the fourth item of the main list (no blank line above). The
    second line of this item is not indented relative to the bullet, which
    precludes it from having a second paragraph.
    - A fifth item, whose second line
     is indented only one space relative to the bullet.

     A second paragraph for the fifth item.

    This paragraph is not part of the list.

Blank lines before bullet list items are optional; blank lines are only
required to separate list items from other types of text blocks, as noted in
the example. The indentation of bullet list items takes the bullet itself
into account. In the second list item above:
- The second paragraph is indented relative to the bullet. The second
  paragraph must line up with the left edge of the first.
- The bullet of the sublist is indented relative to the bullet of the outer
  list's item.

Enumerated Lists
A text block which begins with a sequence label is treated as an enumerated
list (ordered list) element. Sequence labels can be::

      1. A sequence of digits followed by a period ('1.'), a colon ('1:'), a
         dash ('1-'), a space and a dash ('1 -'), a right-parenthesis
         ('1)'), or surrounded with parentheses ('(1)').
      B. A single letter (uppercase or lowercase) followed by a period etc.
      III. A roman numeral (uppercase or lowercase) followed by a period
           III.a. A sequence of enumerations, separated by periods and
                  ending with a period etc.
           (III)(b) A sequence of enumerations, each enclosed in paretheses.
           III(c) A mixture of styles.

Nested enumerated lists must be created with indentation (as in the example
above). Enumerators are not interpreted.

Descriptive Lists
A text block with a first line that contains some text, followed by
whitespace, '--', and some more whitespace, is treated as a descriptive list
element. The '--' must be on the first line. The leading text is the term,
and the text after the '--' is the description::

    Type A -- The description may begin immediately after the '--', as long
    as the description is only one paragraph.

    Type B -- The description may begin immediately after the '--', and may
              contain multiple paragraphs if second and subsequent lines are
              indented relative to the left edge of the first line.

              Description paragraph 2, indented to the same level.

    Type C -- Type C is a variation
        of Type B.

        Description paragraph 2, indented to the same level.

    Type D --
        The description may also begin below, indented. This is useful for
        multiple paragraphs, or arbitrary text blocks (lists, etc.).

        Description paragraph 2, indented to the same level.

For type A descriptive list items, the second line of the description
paragraph is checked for ' -- '. If present, it is assumed that it is the
start of another list item. Example::

    Item One -- Description.
    Item Two -- Description.

Option Lists
.. XXX perhaps this should be left as an extension?

Option lists are two-column lists of command-line options and descriptions.
There are two types of options: short and long. Short options consist of one
dash, an option letter, and an optional argument placeholder. Long options
consist of two dashes, an option word, and possibly an argument placeholder.
There must be at least two spaces between the option and the description.
The option acts as a bullet, and description begins a new text block which
may contain multiple paragraphs and body elements. For example::

    -a       Output all
    -b       Output both (this description is
             quite long)
    -c arg   Output just arg.
    --long   Output all day long.

Tables are described with a visual outline made up of the characters '-',
'|', and '+'. The hyphen ('-') is used for horizontal lines (row
separators). The vertical bar ('|') is used for vertical lines (column
separators). The plus sign ('+') is used for intersections of horizontal and
vertical lines.

Each cell contains body elements, and may have multiple paragraphs, lists,
etc. Example:

    |  Column 1  |  Column 2  | Column 3 & 4 span (Row 1) |
    |    Column 1 & 2 span    |  Column 3  | - Column 4   |
    +------------+------------+------------+ - Row 2 & 3  |
    |      1     |      2     |      3     | - span       |

Paragraphs are what's left when all other body element markup is exhausted.
They consist of blocks of text with no external markup indicating any other
body element.

Blank lines separate paragraphs from each other and from other body
elements. However, when unambiguous due to markup, blank lines may be

An alternate style of indented-first-line paragraphs is as follows:

        This is a paragraph
    with an indented first
        Here is a second such

Inline Markup
Inline markup is the markup of text within a text block. Inline markup
cannot be nested.

Inline Code
Text enclosed by backquotes (with whitespace or punctuation to the left of
the first backquote and to the right of the second backquote) is treated as
`example code`. Inline code is typically set in a monospaced typeface.

Text surrounded by '**' characters (with whitespace or puctuation to the
left and to the right) is **emphasized strongly**, typically displayed
as boldface.

Text surrounded by '*' characters (with whitespace or puctuation to the left
and to the right) is *emphasized*, typically displayed as italics.

Standalone Hyperlinks
An absolute URL within a text block is treated as a general external
hyperlink with the URL itself as the link's text. For example, ::

    See http://www.python.org for info.

would be marked up in HTML as::

    See <A HREF="http://www.python.org">http://www.python.org</A> for info.

Indirect Hyperlinks
Indirect hyperlinks consist of two parts. In the text body, there is a
source link, a name with a trailing underscore::

    See the Python_ home page for info.

Somewhere else in the document is a target link: two dots, a space, an
underscore, the same name used for the source link (no trailing underscore),
colon, whitespace, and a URL (relative or absolute)::

    .. _Python: http://www.python.org

Combined, this is expressed in HTML as::

    See the <A HREF="http://www.python.org">Python</A> home page for info.

Phrase-links (a hyperlink whose name is a phrase) can be expressed by
enclosing the phrase in brackets and treating the bracketed text as a link

    Want to learn about [my favorite programming language]_?

    .. _my favorite programming language: http://www.python.org

If a phrase-link name contains any colons, they must be backslash-escaped in
the link target.

Internal Hyperlinks
Internal hyperlinks connect one point to another within a document. They are
identical to indirect hyperlinks except that there is no URL in the target
link. For example::

    .. _target:
    This is the target point.

    Clicking on this internal hyperlink will take us back to the target_.

Footnotes are like internal hyperlinks with text in the targets. Footnotes
consist of two parts. In the text body there is a source link: a bracketed
name (an alphanumeric string with no spaces), with a trailing underscore::

    Please refer to the fine manual [GVR2000]_.

Somewhere else in the document (not necessarily at the end) is a target
link: two dots, a space, an underscore, the same bracketed name used for the
source link (no trailing underscore or colon), whitespace, and the footnote

    .. _[GVR2000] Python Documentation, van Rossum, Drake, et al.,

Syntax Diagrams

Paragraphs may be separated by a blank line::

    | paragraph                    |
    |                              |
    | paragraph                    |
    |                              |

First-line-indented paragraphs require no blank line to separate them::

    | paragraph                    |
    |                              |
       | paragraph                 |
    +--+                           |
    |                              |

Code blocks indicated by '::' at the end of the preceeding paragraph::

    | paragraph                    |
    |                        '::'$ |
       | code block                |

List item blocks which are indented relative to the bullet or enumerator may
contain multiple body elements (paragraphs, etc.)::

    | '- ' | list item             |
    +------|                       |
           | (body elements)+      |

    | '- ' | list item             |
    +--+---+                       |
       | (body elements)+          |

List item blocks which are not indented relative to the bullet or enumerator
contain a single paragraph only::

    | '- ' | paragraph             |
    |------+                       |
    |                              |

Block quotes are indented relative to the preceeding text::

    | current level of             |
    | indentation                  |
       | block quote               |
       | (body elements)+          |

Comments begin in column 1 with two dots and a space::

    | ^'.. ' | comment block       |
    +--+-----+                     |
       |                           |

Directives are comments which begin with a directive name and two colons::

    | ^'.. ' name '::' | directive |
    +--+---------------+ block     |
       |                           |

Footnotes use comment syntax with an underscore footnote name in brackets::

    | ^'.. _[' name ']' | footnote |
    +--+----------------+          |
       | (body elements)+          |

Hyperlink targets use comment syntax with an underscore, link name, and a

    | ^'.. _' name ':' | link      |
    +--+---------------+ target    |
       |                           |

(Internal hyperlinks have empty link blocks. Indirect hyperlinks have an
absolute or relative URL in their link blocks.)