[Doc-SIG] Re: reStructuredText Markup Specification
Wolfgang Lipp
castor@snafu.de
Wed, 6 Jun 2001 09:46:27 MET
Intro
On Sun, 03 Jun 2001 10:34:24 -0400,
David Goodger wrote:
> For section structure, indentation
> is unnatural and awkward.
First of all, I would like to say that I find David's
proposal overwhelming for both its comprehensiveness and
its length. Pretty much everything that might be covered
is indeed mentioned, up to the coverage of DOM.
However, I would like to object strongly to the advertised
abolition of indention to indicate document structure.
In this posting, I would like to motivate with a hopefully
appropriate example and a following in-detail discussion
that all of David's fears concerning structure-by-
indention, with one small caveat, are not substantial in
my view.
Structure-by-Indention A Valid Generalization
In contradiction to what David says, I find indent[at]ion
natural and elegant, an appropriate and innovative means
to express structure both in the application to
programming language sources and to text markup. Moreover,
the principle of hierarchical-structure-through-indention
is applicable to a third domain, namely, to structured,
'non-binary' data files.
In other words, there is a chance not only to produce a
standard for the limited use of docstrings, there is
definitely a chance to set up a framework in which the
sources of: (1) the programming language, (2a) its inline
documentation and (2b) other information materials, as
well as (3a) configurational files and (3b) databases(*)
are interpretable.
(*) as far as 'binary' formats are not deemed to
be more appropriate for some purposes; but think
of mid-sized address collections etc., where non-
binary, user-editable formats are definitely a big
plus.
At first, this appears to be a gargantuan task, given the
volume of the docstring proposals alone. It may also seem
to be off-topic to mention configuration files when
documentation is discussed. However, the distinction
between a structured database, a configuration file and a
documentation is a rather superficial one: in all cases we
have sequences of values that are (optionally) given
names, where values in themselves may be either 'terminal'
(eg, signify simple numbers or sequences of characters) or
in turn contain more names with associated values: a name,
therefore, may refer either to a single value or an
arbitrarily deeply nested sequence of names and values. In
a language like Python, the source symbolizes sequences of
statements and expressions; again, where one entire group
of statements is dependent on one single statement,
indention is used to convey the scope of this dependency.
Structure-by-Indention vs. Structure-by-"Style"
Missing important generalizations at key junctures of a
developing process may proof to be very expensive in the
long run. As a case in point, let's have a look at the one
most popular configuration file format to date, the 'ini'
format; this consists of unordered list of items, followed
by equal signs and optionally divided into non-
hierarchical sections indicated by bracketed names, such
as:
[Bolts]
foo = 42
bar0 = 3
bar1 = 4
[LeversTypeA]
foo = 84
outer_x = 10
outer_y = 12
inner_x = 8
inner_y = 8
[LeversTypeB]
foo = 92
It is easy to see that the specification of the 'ini'
format fails to recognize 'sequences' and 'structures'. It
is, therefore, very clumsy to express these concepts, when
the need arises, in this format; commonly, the sketched
kind of kludges (numbered and compound names) are used as
workarounds. A revised version along the lines of Python
syntax and the original StructuredText proposal painlessly
removes these shortcomings (beta implementation available
as 'pylon.xcfg'; as an option, one could consider to make
trailing '=' obligatory at the end of lines that start a
block):
Bolts
foo = 42
bar # or, "bar = (3,4)"
3
4
Levers
TypeA
foo = 84
outer
x = 10
y = 12
inner
x = 8
y = 8
TypeB
foo = 92
A structure like this patterns quite naturally to Python's
mappings and lists. It is also easy to see that the
structure thus indicated is in principle neither different
from a Python script, nor from a text with paragraphs and
headings.
Now, it is very simple to copy, for example, the section
'Levers.TypeA.outer' to another place, even to another
structural level, say 'Bolts'. You will then have to
change indention; in case you forget that, either
something ungrammatical or something with a different
meaning will result. However, such a mistake would in this
case be only of local impact (ie., the place where the
problem is recognized is the place where the problem
actually occurs; also, one can delineate the offending
construct and could choose to skip in in processing):
Bolts
foo = 42
bar = (3,4)
outer # ungrammatical indention
x = 10
y = 12
[...]
(Of course, the ungrammaticality stems from certain
assumptions made; a syntax where the above construct would
yield, for example, '{'bar':{ 'NN':(3,4), 'outer':{...} },
'NN' being a default name, is also conceivable).
However, in David's proposal, the same structure would
look something like this (please correct me):
======
Bolts
======
foo = 42
bar = 3,4 # or something like this
======
Levers
======
TypeA
-----
foo = 84
outer
.....
x = 10
y = 12
inner
.....
x = 8
y = 8
TypeB
-----
foo = 92
Now, people, please excuse me! but you do not want me or
anyone to believe that this is a clearer, more obvious,
'self-documenting', more maintainable format, do you?
Does anyone honestly think this gets *any* better only
because documentation has typically *more* material
between section headings?
Next, consider what happens when you want to copy
Levers.TypeA.outer to Bolts:
======
Bolts
======
[...]
outer -+
----- | <---+
x = 10 | |
y = 12 -+ |
|
|
====== |
Levers |
====== |
[...] |
|
outer -+ |
..... +-----+
x = 10 |
y = 12 -+
To me, it is completely non-obvious that the markup of
element 'outer' has to change from one arbitrary style
(dotted) to another arbitrary style (dashed). Nothing in
these styles indicates their hierarchical meaning. Also,
in case I forget to change that markup, I do not get an
error anywhere near the offending line -- since markups ar
determined ad-hoc by precedent, it may in fact be the
*following* section that looks ungrammatical. This kind of
markup is difficult to understand and hard to maintain.
Refutation
I would now like to detail my objections against the
reasons David put forth to show the unfitness of indention
as a means to indicate structure in natural-language
texts. Quoting David's posting (see the end of this
message for the original), these reasons are:
(1) Using indentation is [u]nnatural
(1a) Most published works use title style (type
size, face, weight, and position) and/or
section/subsection numbering rather than
indentation to indicate hierarchy. When
indentation is used, it is usually the formatted
end-result and is there for aesthetic rather than
structural purposes.
(1b) [T]he style of the section title should
indicate its structure. [...] In fact, [section
structure through title style] is already in
widespread use in plain text documents, including
in Python's standard distribution (such as the
toplevel README_ file).
(2) Using indentation is [a]wkward.
(2a) One must think about the formatting as the
text is keyed in.
(2b) And when structural changes are made (it is
very common during the composition of a document
to rearrange sections and their hierarchy) we must
use block-indent and -unindent functions.
(2c) In order to edit documents using indentation,
relatively advanced text editors must be used.
(3) Applying indentation to ordinary written text is
hypergeneralization.
Following are my objections.
1: Indention Unnatural?
Indention is 'natural' -- that's why C programmers use
it although they don't have it, that's why GvR chose
indention for Python although he didn't have to.
Indention is a typographical device that came into
widespread use at least with the spread of the
typewriter, and what are plain text editors but
software typewriters?
It is a near-no-no *not* to use indention in *some*
places, even in places where other typographical means
of expression *are* available (verse, quotes, mottos,
abstracts...), or where it is redundant (unindented
code is almost tantamount to obfuscated code).
It is interesting to see that David's proposal *does*
keep indention for these cases:
* lists
* term definitions
* literal blocks
... more?
Therefore, indention is indeed highly 'natural' and
also 'appropriate'. I try to show this in the
discussion of almost every single question in this
posting. True, it may be painful in cases where you
must use an editor that is weak on this point. Above,
I even wrote a list of two integers in indented style;
most of the time a list notation would be clearer
(short items: parentheses -- long items: indention).
For this and other reasons, it may be worthwhile to
consider a syntax that allows both indention and other
means (such as parentheses) to indicate document
structure, so both are always available options.
(Python, of course, uses parentheses for mappings,
lists, and tuples, but indentions for classes,
functions, etc.; here, both means are used, but not
interchangeably so).
1a: Published Works Don't Do It, Let's Do It
As stated above, professionally typeset and published
materials do in some cases use indention, but mostly
not for the use of indicating section structure.
However, 'real' typography, as David notes, too, has
means that are simply not available in a 'typewriter'
situation. And, when you open a book, you will see
that it is, in fact, not only the size and position of
titles that matter -- it is also the space given to
them: Chapters often commence only on uneven pages,
leaving an entirely blank page to their left, sections
have a considerable amount of space above and some
more below them. Granted, we also meet with the
occasional embellishment here and there, which may
take on the shape of a line.
However, David's argument in itself is a bit of a
problem because it is claimed that we users of editors
should mimick them typographers in their ways (and
shun indention, because, you see, in books they don't
use it either), while it is at the same time
acknowledged that we lack the means to do so (we don't
have big type, so let's use underline). This is
contradictory.
Secondly, it is stated that indention "is usually the
formatted end-result and is there for aesthetic rather
than structural purposes" -- well, it seems to me that
David's underlined section headings are rather
'aesthetically' than 'structurally' motivated, at
least when compared to indention.
Also, it is a little bit of a folly to throw in the
concept of 'aesthetic' at this point of the discussion
and expect people to understand this to be a kind of a
bug, a failure, a misunderstanding, a faulty approach
to consider 'aesthetic' aspects when what you really
wanted was clarifying 'structure'. If those books with
their typesizes and empty spaces were not 'aesthetic',
who would read them? If our design of typeset text
were not 'aesthetic', could we even manage to write
them? If not about the 'aesthetics', what else is it
that we talk about here? If aesthetics are 'out', I
guess then parentheses are 'in': indention is not the
easiest to parse, so why bother?
It is precisely 'aesthetics' and almost nothing but
'aesthetics' that we talk about here. This is a 99%
pure 'aesthetics' discussion, the rest being
feasibility (in Python code). It is not 'practical',
not from the computer's point of view, to have
scripting languages -- that's a mere fuzz, a waste of
CPU cycles. No computer 'writes' documentation, no
computer cares about the look of documents, drafted or
printed. It is us who is doing this, and what we are
looking for is a practical, manageable, readable and
pleasing way of doing the job, in other words, we are
looking for a beautiful solution of the problem.
However, the proposed solution for over- and
underlined section titles, while it may have some
visual appeal (that a line put into a comment would
also lend), misses to fulfill the promise that is a
practical means (demonstrated when we tried copying
elements in the last section). We surprisingly also
lack the technical means to conveniently manage over-
and underlining of headlines, as I try to show in
point (2c), below.
So, typographers don't do it (well, not all of the
time), but they have other means. We are programmers
and documentation authors, working on software
typewriters, so let's do it.
2: Indention Awkward?
Indention is elegant. Trying to convince Python people
of the elegance of indention is unnecessary, they're
already convinced of this. It should be hard for a
programmer to accept a scheme that is purportedly
'natural' in its indication of 'structure' when it
uses arbitrary, highly context-sensitive and ambiguous
lines instead of (any or all of) indention,
parentheses, begin-end-commands, i.e. those means that
are, for a programmer, the most logical choices.
2a: Think While You Type?
David, no. You are with your text when you write it,
are you? And what, please, is the application of a
proper (and arbitrary) line style but an activity that
necessitates a certain amount of awareness? I reject
this point.
2b: Structural Changes Difficult?
The main points to be made here have already been
discussed in the previous section, entitled
'Structure-by-Indention vs. Structure-by-"Style"'. Let
me add here that I consider it as one of the vices --
or omittances - of HTML and most markup schemes that
authors are forced to indicate all section levels
explicitly. As David says, the writing of a document
is a process where many changes even in the structure
of a document are made. But how often does the author,
who for some reason tackel their source with a plain
editor, has to go through all those tags, exchanging
all the numbers in all the tags, twice for every
heading, upon finding an unsatisfactory structuring of
the document! Of how little help all the advanced
regular expression replacement tools are in this case!
How big the surprise on finding out that with the new-
fangled docstring format they will find themselves at
very much the same impasse again! How much would he
love to even use M$W*rd, if only for the outline view!
Outline view with symbolic indention! Isn't this
cornerstone software of the evil WYZYWYG empire one of
the most unlikely places in the universe to find
concrete indention being replaced by abstract,
symbolic indention? But it works, and it's easy:
change structure, no problem, drag, drop, all formats
cared for.
Indention is the single trick that allows users with
plain text editors to prove they're no dummies when it
comes to restructuring. David complains about having
to change the indention all the time. But this is a
feature, not a bug. It is well intended that the level
of the section is *not* written down. It is done due
to the insight that a subsection is not different from
a subsubsection: the latter only happens to be at a
structurally deeper level than thee former.
Accordingly, in indented text, what you do in order to
*move* the level is you *move* the text. So much for
the 'Indention Is Unnatural' argument.
If someone thinks one must have concrete, absolute
section levels, and there may be situations where they
are advantageous, please make a proposal that shows
the user section levels and not twiggly vs. dotted vs.
dashed single and double lines. I suggested elsewhere
to introduce proper commands (I find the 'directives'
of the proposal wholly unsatisfactory, especially in
the context of a scripting language), and I used
double semicolons for the demonstration. Therefore,
one could use
;;h My Title
The body of the section
goes into subsequent blocks.
;;h Another Title
The body of the subsection
goes into subsequent blocks.
for relative heading-body pairs and
;;h3 My Title
The body of the section
goes into subsequent blocks.
;;h+1 Another Title
The body of the subsection
goes into subsequent blocks.
for absolute and relative, explicit markups.
Additionally, again as stated elsewhere, I think it is
advantageous and more systematical to introduce
explicit, if somewhat lengthy, extensible, self-
explaining commands and only then associate these, as
far as there is need and mutual agreement, with
typographic situations ("Single line, no punctuation
at end, followed be indented blocks" and so on). This
kind of procedure gives authors much more orientation
and feature-safety.
2c: Indention-Capable Software Not Available?
In point (2c), David says that one "must use block-
indent and -unindent functions[, features of]
relatively advanced text editors". Well, at least we
*do* have *some* editors that have functionality to
perform the indenting and undenting of groups of lines
-- can anyone name a text editor that has a similar
functionality to perform underlining? Can anyone,
please, point out a text editor that does all of
these:
* do *both* over- *and* underlining,
* keep track of the characters in over- and
underlining being the same,
* keep both over- and underline at the same
length,
* keep both over- and underline at least as long
as the right edge of the intervening title.
I do not know any editor with any of these
capabilities (or 'awarenesses'). Sure, you can write
an Emacs macro to do that, but then, Emacs is exactly
that kind of "relatively advanced" software that
David does not want to be forced to use (nor do I).
Moreover, if indention is only available in
"relatively advanced text editors", as David observes,
then, please! where is the editor, apart from Emacs,
that supports the proposed table format? I only know a
very few that support a 'line drawing mode' (ie,
moving the cursor leaves a line as trace; linestyles,
intersections etc.), but that is a very far cry from
being able to draw (or manage) *tables*.
I for sure am one who, as a reader, would definitely
enjoy more readable tabular data in plain text. As an
author, however, I am loathe to find myself being
obliged to use Emacs (and I know the program) only
because that's the only software in the world that
knows how to decently handle ASCII tables (as an
*optional* format I can, of course, only welcome the
proposal for tables).
It is not quite clear to me how to sell this: First,
the well-established device of indention is more or
less (but not entirely) thrown out, partly on the
grounds that current text editors are purportedly not
able to handle it (or perhaps make it difficult to use
"block- indent and -unindent functions" -- I use the
tab key for that purpose). Then, a format for section
headings is suggested that current editing software is
plainly ignorant about. Next, a table format is
introduced that in 99% of all editors turns out plain
hell as soon as a single cell has to change size (try
it once). This argumentation fails to convince me.
(3) Indention A Valid Generalization
I contend that indention is a valid generalization to
indicated structure of a given text. It is precisely
the generalization expressable by indention that is
missing in, for example, HTML: In HTML, you put one
heading into the text, then a paragraph, then another
heading, again followed by a paragraph. While this in
itself suffices to indicate the structure, it is not
quite obvious why, in another case, both list items,
which are members (dependants) of a list, and the list
itself are made structurally explicit. Clearly, this
difference in treatment is unjustified, although
practical reasons may be found. In theory, an HTML
markup should (and, syntactically, could) look
something like this:
<section>
<title>My Title</title>
<body>
<p>The body of the section
goes into subsequent blocks.</p>
<section>
<title>Another Title</title>
<body>
<p>The body of the subsection
goes into subsequent blocks.</p>
</body>
</section>
</body>
<section>
Of course, this is sort of a markup-overkill for the
weathered indentionist.
The drawback of the HTML view is simply that the
structure of the markup is not as congruent with the
conceptual structure of a document as would be
possible.
Conceptually, a chapter 'has' a 'heading' and
'contains' 'text', which in turn may be divided into
'sections' and so on.
In HTML, however, a 'heading' has a 'level' and
'precedes' a 'text', and perhaps another 'heading' of
another 'level'.
In the first view, it is *level* that follows from the
structure, while in the second, it is the *structure*
that must be deduced from the 'levels'. This, indeed,
is a rather decisive difference.
According to David's proposal, docstrings would suffer
from the same lack of sound generalization, with all
difficulties, as HTML documents.
One More Remark, A Caveat And Conclusion
Apart from the treatment of indention in the proposal, I
also have some doubts abouts the fitness of the proposed
markups for definition lists (number 8 of the proposal)
and literal blocks (number 9). In the first case, the
proposed markup appears somehow too volatile to me, in the
second, it is quite arbitrary. Again, wouldn't we be
better off with markup to signify 'commands' or 'role
indicators'? Then, taking ';;' for the purpose of
demonstration, we could have
;;glossary
foo
A subspecies of gnu.
doo-foo
Mythical animal; a winged foo.
rants
Wide-eyed geckoes.
as well as
;;lit
Literally, a line.
It is, then, still possible to associate more
unobstrusive, less explicit formatting characteristics
with these or other data formats; however, that choice
would be much more configurable and explicit than the
procedures presently proposed. I think it is more
promising to make an extensible, explicit scheme and then
allow shortcuts to those features than to bind some non-
explicit markup early in the decision process to some very
specific purpose.
(BTW, is there a distinction between 'literal' segments
and 'code' segments? That would be important for coloring
and formatting).
Now for the caveat announced above. Yes, the proposal is
right, those underlines do somehow stick out, I admit
that. But, isn't that more appropriately effected with a
line of dashes in a comment within the docstring?
Since lines of dashes and the like would then be free
again I suggest that a concrete markup is used for
horzontal rulers. HRs are very practical in long texts,
although typographs and web design advisors discourage
their use (but those people don't deal with long-running,
single- page technical documentation). And what markup
could be more suggestive than lines made up of whitespace
plus nothing but repetitions of any one of these
characters: '-.,;:_#+*~'(etc.).
Concluding, I urge everybody not to abolish indention.
That wouldn't be very Python, I'm afraid.
Wolfgang Lipp
castor@snafu.de
lipp@epost.de
full quote:
>3. Structure via Indentation
>============================
>Setext_ required that body text be indented by 2 spaces. The original
>StructuredText_ and StructuredTextNG_ require that section structure be
>indicated through indentation, as "inspired by Python". For certain
>structures (outlines, lists, literal blocks, block quotes) indentation
>naturally indicates structure or hierarchy. For section structure,
>indentation is unnatural and awkward. Rather, the style of the section
>title should indicate its structure.
>In the original StructuredText, sections consist of one-line title
>paragraphs followed by indented paragraphs and other body elements. Using
>indentation is:
>- Unnatural. Most published works use title style (type size, face, weight,
> and position) and/or section/subsection numbering rather than indentation
> to indicate hierarchy. When indentation is used, it is usually the
> formatted end-result and is there for aesthetic rather than structural
> purposes.
>- Awkward. One must think about the formatting as the text is keyed in. And
> when structural changes are made (it is very common during the
> composition of a document to rearrange sections and their hierarchy) we
> must use block-indent and -unindent functions. In order to edit documents
> using indentation, relatively advanced text editors must be used.
>Python's significant whitespace is a wonderful innovation (even if not
>original to Python), however applying indentation to ordinary written text
>is hypergeneralization.
>reStructuredText_ indicates section structure through title style (as
>exemplified by this document). This is far more natural. In fact, it is
>already in widespread use in plain text documents, including in Python's
>standard distribution (such as the toplevel README_ file).