[PYTHON DOC-SIG] setext in doc strings

Jim Fulton jim.fulton@digicool.com
Mon, 05 Aug 1996 16:42:03 -0400

Robin Friedrich wrote:
> I've been working with Daniel Larsson on gendoc. Currently there is a
> little setext parser built into gendoc which identifies text structure
> and stores the components in a metadocument which can be rendered in a
> number of output formats (notably HTML and MML).  Since most folks are
> not necessary familiar with setext markup I'd like to provide a brief
> synopsis. If you use this stuff in your doc strings nice things will
> happen to your autogenerated manuals.:-)

First, I apologize for the tardiness of my reply.

I spent some time looking at setext after the workshop and was fairly
underwhelmed.  Actually setext document I looked at were sort of ugly
in their basic form and example setext converted to html was often
I also has a tough time making out the setext documentation, which
my opinion somewhat.

In a separate note, I released a Structured text module that I consider 
to be superior to setext in several ways:

  - The sourse text os more readable,
  - It supports arbitrary levels of nesting, including numbered,
    and descriptive lists.
  - It generates HTML tags like <strong> and <em>, rather than <bold>

> SETEXT 101
> ==========
> Below is the setext definitions from the BSDI project. Note that not
> all tags are supported (or needed) in python doc strings.

This looks like the documentation I found for setext.  I had trouble
making it out then and have touble making it out now. :-|

> Valid Typotags Table
> ---------------------
>  ____________________  ___________________  _______________ ____________ v14
>  current (online) use  setext form          acted upon or        name of
>  of text emphasis      of same              displayed as     the typotag  ?
>  ====================  ===================  =============== ============ ===
>  Internet mail header  From <source>        Subject: shown    subject-tt (a)
>  (start of a message)  minimal mail header  [Date: & From:]

I assume this doesn't apply to us?

>  --------------------  -------------------  --------------- ------------ ---
>  title (1 per text)   "Title                a title             title-tt (b)
>  in distinct position  ====="               in chosen style

Is gendoc using this?  This mechanism of setext is rather restrictive
and ugly.

>  --------------------  -------------------  --------------- ------------ ---
>  heading (1+/ text)   "Subhead              a subhead         subhead-tt (c)
>  in distinct position  -------"             in chosen style


>  --------------------  -------------------  --------------- ------------ ---
>  body text               66-char lines in-  lines undented     indent-tt (d)
>  [plain not-indented]    dented by 2 space  and unfolded


>  --------------------  -------------------  --------------- ------------ ---
>  1+ bold word(s)           **[multi]word**  1+ bold word(s)      bold-tt (e)

*mult word* would be more readable and follows standard conventions.  I
emphasis is better than bold.  This is what I did in StructuredText.

>  a single italic word               ~word~  1 italic word      italic-tt (f)

This looks ugly.  Why specify italic directly?  Doesn't this run counter

If the group wants this, I'd be willing to add it to StructuredText.  
If I do, what consitutes a 'word'?

>  1+ underlined words        [_multi]_word_  underlined text underline-tt (g)

What consitutes a word?  Does this run afoul of

>  hypertextual 1+ word        [multi_]word_  1+ hot word(s)        hot-tt (h)

This is weird.  Where is the reference?  Has this been implemented in

>  >followed by text     >[space][text]       > [mono-spaced]   include-tt (i)

This looks like a quoted email message.  But I guess it makes sense.

>  bullet-text in pos1   *[space][text]       [bullet] [text]    bullet-tt (j)

I think 'o text' and '- text' are more readable.

>                        `_quoted typotag!_`  `_left alone!_`     quote-tt (k)

`_e_gads!_`  I like 'this much better'

>  --------------------  -------------------  --------------- ------------ ---
>  [hypertext link def] ^.. _word URL         jump to address      href-tt (l)
>  [hypertext note def] ^.. _word Note:("*")  ("cause error")      note-tt (m)

I have no idea what this means.

>  --------------------  -------------------  --------------- ------------ ---
>  end of first? setext  $$ [last on a line]  [parse another]   twobuck-tt (n)
>                       ^..[space][not dot]   [line hidden]    suppress-tt (o)
>  logical end of text  ^..[alone on a line]  [taken note of]    twodot-tt (p)


>  ====================  ===================  =============== ============ ===
>  Note: only one instance of the element (c) (or, in its absence, (b))
>     is absolutely _required_ for a text to be considered a valid setext.
>  All the elements but (c) are in effect optional, not necessary for
>     a setext to be declared as such.  Element (a) deals with setexts
>     that arrive via email and end up being parsed (processed) as
>     unedited mailbox files; fully employed the (a), (b) and (c) make
>     it possible to distribute "multisetexts", i.e.  setexts with one
>     additional level of logical structure (= more than one setext per
>     message; more than one message in a mailbox).  If such file is
>     viewed as a multisetext it will result in 3-level-outline
>     structure: mail-subjects become top-level chapters, setext titles
>     denote subchapters (topics) and the subheads yet finer threads
>     within these (still a notch ABOVE mere "paragraphs of text").
>  $$
> -----------------------------------------------------------------------
> The following doc string example illustrates the usage of all setext
> constructs recognized by the gendoc tool. (i think)
> class Setext(Text):
>     """Lets you change markup to stylize your text
>     SETEXT 102
>     ==========

This is not valid setext.  Setext wants the titles and headings to start 
in column 1 and the other text in column 3, like this:


  **Setext** can be used to mark your text in a non-obtrusive
  manner. Text within double asterisks are treated as bold, ...

>     **Setext** can be used to mark your text in a non-obtrusive
>     manner. Text within double asterisks are treated as bold,
>     while single words with tilde at the front and back are
>     rendered as ~Italic~. You can _underline_a_phrase_ but it
>     will be rendered as bold in HTML. Placing hyperlinks
>     is easy; just hilite_the_tag_ and at the bottom of the doc
>     string include the address which it points to on a line by
>     itself.
>     New paragraphs are separated by blank lines.
>     > And a bunch of literal text
>     > can be specified with the left
>     > arrow. This gets marked as <pre> in HTML.
>     Otherwise the text will be wrapped according to whatever
>     output formatter is used.
>     A bulleted list is done with single asterisks thusly:
>     * Lettuce
>     * Onions
>     * Pickles
>     Extension to setext
>     -------------------


>     A frequent construct in python doc strings is to list ones
>     keyword arguments. This made us wish for a way to specify
>     a definition list so that it looks nice is html (and others).
>     I propose the following. I have this working in my version.
>     The double colons won't be in the output.
>     item1 :: definition 1
>     item2 :: definition 2
>     item3 :: a rather long and involved definition for item 3
>              spanning more than one line.
>     item4 :: back to brevity with definition 4

Why not:

      item1 -- Definition 1

This looks much better to me, and works with StructuredText.
>     .. _hilite_the_tag http://www.python.org
>     """
> Notes:
> The indenting inserted by python-mode for the entire doc string is
> detected and processed out before setext rules are applied. So
> eventhough titles for example are required to start in column one they
> will if they obey the overall indenting for that doc string.

> The underlines for the title and subtitle should be the same length as
> the title itself.
> Spaces around tokens are important (for the "* ", "> ", and " :: ")
> Comment are hearby welcome.

I think my structured text module mechanism provides richer 
text formatting with less obtrusive markup, especially for 
strings that have much structure, as many of mine do.


Jim Fulton         Digital Creations
jim@digicool.com   540.371.6909
## Python is my favorite language ##
##     http://www.python.org/     ##

DOC-SIG  - SIG for the Python Documentation Project

send messages to: doc-sig@python.org
administrivia to: doc-sig-request@python.org