[Doc-SIG] some ideas for reStructuredText & document model

David Goodger goodger@users.sourceforge.net
Thu, 28 Feb 2002 22:30:03 -0500

> > - Change footnote syntax from ``.. [1]`` to ``_[1]``?

> Hmm. Colour me ambivalent on this one

Me too. But it has potential. See my followup, where I propose ``_1.``
as the footnote marker syntax and ``1_`` for footnote references. I'm
much happier about these.

> I quite like being able to scan down the left hand side of the text
> looking for "..<space>" to allow me to find the things being
> referenced.

Could that just be familiarity talking? Tibs, the voice of the
status-quo! ;-)

Acceptance of the new syntax would require accepting that a footnote
*isn't* the same type of element as a comment or directive.

> >   Note that the body of the footnote need not be indented.

Please note that my "note" was wrong. Revised:

    Note that the body of the footnote still needs to be indented, by
    at least one space. Without indentation, a footnote could only
    contain one paragraph.

> I'd suggest producing an example with mixed footnotes and reference
> targets

Please do!

> > - Differentiate author-date "citations" (``[GVR2002]``)
> >   from numbered footnotes? Create a new set of DTD elements:
> >   "citation" and "citation_reference"?
> Surely that's a presentation issue, and thus only relevant to the
> Writer? - i.e., if the footnote is numbered (or anonymously numbered)
> then present it one way, otherwise another.

Actually, I'm thinking of interpreting, representing (in the doctree),
and processing footnotes and citations (& their references) in
different ways.

> > - Render footnote references as superscripts without "[]"?
> This is definitely a Writer issue

I'm not so sure. I've since realized that my footnote/citation ideas
are all related, and point to changes in the reStructuredText syntax
as well as the docutils tree model. Don't forget, reStructuredText is
"what-you-see-is-what-you-get plaintext markup", so how it ends up
being rendered *does* influence the markup (we want the input to
resemble the output as much as possible). See my followup post,
"Reworking Footnotes", which I'm developing in alternatives.txt__.

__ http://structuredtext.sf.net/spec/alternatives.txt

> many HTML browsers, for instance, are rubbish at presenting
> superscripts (and I'd want to check our what lynx or links did with
> superscripts, too!), so if your HTML Writer were to adopt this
> approach, you might well get complaints.

I'm aiming this HTML Writer at late-model graphical browsers, and
writing it to standards: it uses HTML 4.01 (almost "strict") and CSS1
extensively, but no CSS2 (still not widely implemented it seems).
Implementation variations are to be expected, and user feedback will
of course be welcome.

If the demand warrants (and if time allows and/or volunteers arise),
several HTML variant Writers could be developed: for HTML 3.2, for
text browsers, etc. Each could render the output as best suited.

> > - Make footnotes two-way, GNU-style? What if there are multiple
> >   references to a single footnote?
> Again, this is a Writer issue.

Agreed. A transform issue, anyway.

> > - Directive ideas: TOC (GNU-style two-way), endnotes *here*,
> >   citations *here*.
> Surely also influenced by a command line option, in the same way
> that the choice of fold in or call out indirect hyperlinks is a run
> time choice?
> (a [set of] command line option[s] to say that all notes or
> footnotes or whatever are to be gathered together at the end (of the
> document, section, etc.), and what the section they are gathered
> into should be called, are surely things to consider strongly as
> options to the Writer, and thus also command line options to the
> interface to the system using that Writer)

We could have command-line options for some defaults or variations,
but they're limited. When I write a document, I want to be able to
specify where to put the table of contents. Similarly for an index or
a collection of endnotes (although I could see establishing defaults,
especially for endnotes & citations).

For a hypothetical example::

    The Book of All

    .. toc:: Table of Contents
       :depth: 2


    Revealed herein for the first time: the true meaning
    of life. After years of painstaking research...

> > - Add a list of pending transforms to the document node,
> >   generated by directives? Or add an element, "pending" perhaps,
> >   which encapsulates the transform, the point at which to apply
> >   it, and any data required.
> Not sure I understand this, but I'm behind in my understanding of
> the system at the moment anyway...

I think an example will clarify this. Say you want a table of contents
in your document. The easiest way to specify where to put it is from
within the document, with a directive::

    .. toc::

But the "toc" directive can't do its work until the entire document
has been parsed (and possibly transformed to some extent). So we need
to leave a placeholder behind that will trigger the second phase of
the directive's processing. The directive can leave an element at that
point in the doctree, something like this::

    <pending directive="toc" ...other attributes...>
        ...any directive data...

Simultaneously, we add the "pending" node to a list attached to the
document's root node, so that a later stage of processing can easily
run all pending directive transforms.

> > - Add a "sidebar" element to the DTD? Like a generic admonition or
> >   floating mini-section. Useful for TOC, system messages section,
> >   abstract, etc.
> Presentation again - it's usefulness depends entirely upon the
> target format

I think I didn't explain well enough. DocBook has a good example of
what I mean, the "simplesect", defined as "a section of a document
with no subdivisions":

    SimpleSect is one of the top-level sectioning elements in a
    component. There are three types of sectioning elements in

    - Explicitly numbered sections, Sect1...Sect5, which must be
      properly nested and can only be five levels deep.

    - Recursive Sections, which are alternative to the numbered
      sections and have unbounded depth.

    - SimpleSects, which are terminal. SimpleSects can occur as the
      "leaf" sections in either recursive sections or any of the
      numbered sections, or directly in components.

    SimpleSects may be more convenient than numbered sections in some
    authoring environments because they can be moved around in the
    document hierarchy without renaming.

    None of the sectioning elements is allowed to "float" in a
    component. You can place paragraphs and other block elements
    before a section, but you cannot place anything after it.

    (From *DocBook: The Definitive Guide*, by Norman Walsh & Leonard
    Muellner; http://docbook.org/tdg/en/html/simplesect.html)

The Docutils tree implements recursive sections. A SimpleSect or a
"sidebar" is just like a section, except that it has no subsections,
and is allowed wherever a body element (list, table, etc.) is allowed,
but only at the top level of a section. In other words sidebars cannot
nest inside body elements, so you can't have a sidebar inside a table,
a list, or a block quote etc.

The existing abstract element is a fixed-title, fixed-position (in the
doctree) sidebar. The various admonition directives produce
fixed-title sidebar-equivalents (although they can nest inside body
elements). Sidebars would be useful for generic equivalents of
abstracts and admonitions. It gets tiring and cumbersome to keep
adding specialized elements to the DTD. I've already got the system
inserting a "Docutils System Messages" section. The DTD will need an
element to store a table of contents, and I'm sure we'll come up with
other specialized section-equivalents.

Advantages: Sidebars don't have to conform to section placement rules
(e.g., there can be paragraphs before and after, at the same level as
the sidebar). They can be excluded from a table of contents.

> (keep thinking of plain text, LaTeX and PDF as alternatives to
> HTML!)

Actually, I'm basing these ideas on my experience with document
analysis and SGML processing systems. I'm trying to come up with a
format-independent document representation. This is a case where I
don't have a DTD construct suited to the task.

> > - Add character processing? For example:
> >
> >   - "--" -> em-dash (or "--" -> en-dash, and "---" -> em-dash)
> Some of us would quite like these particular cases - I'd look for
> pre-existing conventions, though, as to how many hyphens mean what.


> >   - convert quotes to curly quote entities
> A presentation issue - for HTML, don't bother, as it's essentially
> impossible. For [La]TeX, don't bother as it will do it for you
> anyway(!).


> >   - various forms of ":-)" to smily icons
> Can we all say "ick"?

What, wouldn't you like to see little yellow happy faces interspersed
in your documents? :-P

I think the Wiki people would. 8-}

> >   - others?
> Forced linebreak (without paragraph break) and non-breaking space
> are the other obvious ones, but syntax is problematical (as we've
> found before!).

I had those in rst-notes.txt already. I've merged it all together.

> >   How to represent entities in the text though? Unicode?
> I'd say that this is definitely getting too ambitious for this stage
> of the project.

Yes, probably. Just throwing it out there to see what comes back. Just
like the PEP process, recording all these questions, decisions, and
to-dos is useful for the future.

> Tibs
> (who still thinks that "::" blocks should be indented in the HTML
> output for neat appearance, but realises that others may disagree)

In my stylesheet, they are indented. Or should be. 2 em's worth.

BTW, what browser/version & OS/version are you using?

> .. [1] In the output at
>    http://structuredtext.sourceforge.net/spec/test.html, the example
>    admonitions are uniformly set into boxes that are wider than my
>    browser page. This presumably means that something in the
>    document is of fixed width, rather than relative.

In the test document, the only things with fixed widths are the
graphics (which were actually broken; no "title.png" file where it was
supposed to be) and the literal blocks. In my browser, the "Doctest
Blocks" literal block was the determining factor.

>    I suspect it may actually be the fault of the "docinfo" table at
>    the top of the document, in particular the "copyright" field,
>    which is also too wide. I don't, however, understand stylesheets
>    well enough to see any obvious problem...

I'm just learning about stylesheets myself. From my reading of the
specs, the docinfo table should expand to fill whatever width is
available, but shouldn't force the browser to scroll horizontally.
It's probably exactly as wide as the widest fixed-width element, which
is probably the doctest block. Could be browser-specific, though.
"Abandon hope, all ye who enter here."

David Goodger    goodger@users.sourceforge.net    Open-source projects:
 - Python Docstring Processing System: http://docstring.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net
 - The Go Tools Project: http://gotools.sourceforge.net