[PYTHON DOC-SIG] setext in doc strings
I've been working with Daniel Larsson on gendoc. Currently there is a little setext parser built into gendoc which identifies text structure and stores the components in a metadocument which can be rendered in a number of output formats (notably HTML and MML). Since most folks are not necessary familiar with setext markup I'd like to provide a brief synopsis. If you use this stuff in your doc strings nice things will happen to your autogenerated manuals.:-) SETEXT 101 ========== Below is the setext definitions from the BSDI project. Note that not all tags are supported (or needed) in python doc strings. Valid Typotags Table --------------------- ____________________ ___________________ _______________ ____________ v14 current (online) use setext form acted upon or name of of text emphasis of same displayed as the typotag ? ==================== =================== =============== ============ === Internet mail header From <source> Subject: shown subject-tt (a) (start of a message) minimal mail header [Date: & From:] -------------------- ------------------- --------------- ------------ --- title (1 per text) "Title a title title-tt (b) in distinct position =====" in chosen style -------------------- ------------------- --------------- ------------ --- heading (1+/ text) "Subhead a subhead subhead-tt (c) in distinct position -------" in chosen style -------------------- ------------------- --------------- ------------ --- body text 66-char lines in- lines undented indent-tt (d) [plain not-indented] dented by 2 space and unfolded -------------------- ------------------- --------------- ------------ --- 1+ bold word(s) **[multi]word** 1+ bold word(s) bold-tt (e) a single italic word ~word~ 1 italic word italic-tt (f) 1+ underlined words [_multi]_word_ underlined text underline-tt (g) hypertextual 1+ word [multi_]word_ 1+ hot word(s) hot-tt (h)
followed by text >[space][text] > [mono-spaced] include-tt (i) bullet-text in pos1 *[space][text] [bullet] [text] bullet-tt (j) `_quoted typotag!_` `_left alone!_` quote-tt (k)
[hypertext link def] ^.. _word URL jump to address href-tt (l) [hypertext note def] ^.. _word Note:("*") ("cause error") note-tt (m) -------------------- ------------------- --------------- ------------ --- end of first? setext $$ [last on a line] [parse another] twobuck-tt (n) ^..[space][not dot] [line hidden] suppress-tt (o) logical end of text ^..[alone on a line] [taken note of] twodot-tt (p) ==================== =================== =============== ============ === Note: only one instance of the element (c) (or, in its absence, (b)) is absolutely _required_ for a text to be considered a valid setext. All the elements but (c) are in effect optional, not necessary for a setext to be declared as such. Element (a) deals with setexts that arrive via email and end up being parsed (processed) as unedited mailbox files; fully employed the (a), (b) and (c) make it possible to distribute "multisetexts", i.e. setexts with one additional level of logical structure (= more than one setext per message; more than one message in a mailbox). If such file is viewed as a multisetext it will result in 3-level-outline structure: mail-subjects become top-level chapters, setext titles denote subchapters (topics) and the subheads yet finer threads within these (still a notch ABOVE mere "paragraphs of text"). $$ ----------------------------------------------------------------------- The following doc string example illustrates the usage of all setext constructs recognized by the gendoc tool. (i think) class Setext(Text): """Lets you change markup to stylize your text SETEXT 102 ========== **Setext** can be used to mark your text in a non-obtrusive manner. Text within double asterisks are treated as bold, while single words with tilde at the front and back are rendered as ~Italic~. You can _underline_a_phrase_ but it will be rendered as bold in HTML. Placing hyperlinks is easy; just hilite_the_tag_ and at the bottom of the doc string include the address which it points to on a line by itself. New paragraphs are separated by blank lines. > And a bunch of literal text > can be specified with the left > arrow. This gets marked as <pre> in HTML. Otherwise the text will be wrapped according to whatever output formatter is used. A bulleted list is done with single asterisks thusly: * Lettuce * Onions * Pickles Extension to setext ------------------- A frequent construct in python doc strings is to list ones keyword arguments. This made us wish for a way to specify a definition list so that it looks nice is html (and others). I propose the following. I have this working in my version. The double colons won't be in the output. item1 :: definition 1 item2 :: definition 2 item3 :: a rather long and involved definition for item 3 spanning more than one line. item4 :: back to brevity with definition 4 .. _hilite_the_tag http://www.python.org """ Notes: The indenting inserted by python-mode for the entire doc string is detected and processed out before setext rules are applied. So eventhough titles for example are required to start in column one they will if they obey the overall indenting for that doc string. The underlines for the title and subtitle should be the same length as the title itself. Spaces around tokens are important (for the "* ", "> ", and " :: ") Comment are hearby welcome. -Robin Friedrich ================= DOC-SIG - SIG for the Python Documentation Project send messages to: doc-sig@python.org administrivia to: doc-sig-request@python.org =================
Robin Friedrich wrote:
I've been working with Daniel Larsson on gendoc. Currently there is a little setext parser built into gendoc which identifies text structure and stores the components in a metadocument which can be rendered in a number of output formats (notably HTML and MML). Since most folks are not necessary familiar with setext markup I'd like to provide a brief synopsis. If you use this stuff in your doc strings nice things will happen to your autogenerated manuals.:-)
First, I apologize for the tardiness of my reply. I spent some time looking at setext after the workshop and was fairly underwhelmed. Actually setext document I looked at were sort of ugly in their basic form and example setext converted to html was often broken. I also has a tough time making out the setext documentation, which colored my opinion somewhat. In a separate note, I released a Structured text module that I consider to be superior to setext in several ways: - The sourse text os more readable, - It supports arbitrary levels of nesting, including numbered, bulleted and descriptive lists. - It generates HTML tags like <strong> and <em>, rather than <bold> and <i>.
SETEXT 101 ==========
Below is the setext definitions from the BSDI project. Note that not all tags are supported (or needed) in python doc strings.
This looks like the documentation I found for setext. I had trouble making it out then and have touble making it out now. :-|
Valid Typotags Table --------------------- ____________________ ___________________ _______________ ____________ v14 current (online) use setext form acted upon or name of of text emphasis of same displayed as the typotag ? ==================== =================== =============== ============ === Internet mail header From <source> Subject: shown subject-tt (a) (start of a message) minimal mail header [Date: & From:]
I assume this doesn't apply to us?
-------------------- ------------------- --------------- ------------ --- title (1 per text) "Title a title title-tt (b) in distinct position =====" in chosen style
Is gendoc using this? This mechanism of setext is rather restrictive and ugly.
-------------------- ------------------- --------------- ------------ --- heading (1+/ text) "Subhead a subhead subhead-tt (c) in distinct position -------" in chosen style
Ditto.
-------------------- ------------------- --------------- ------------ --- body text 66-char lines in- lines undented indent-tt (d) [plain not-indented] dented by 2 space and unfolded
Ditto.
-------------------- ------------------- --------------- ------------ --- 1+ bold word(s) **[multi]word** 1+ bold word(s) bold-tt (e)
*mult word* would be more readable and follows standard conventions. I think emphasis is better than bold. This is what I did in StructuredText.
a single italic word ~word~ 1 italic word italic-tt (f)
This looks ugly. Why specify italic directly? Doesn't this run counter to HTML philosophy. If the group wants this, I'd be willing to add it to StructuredText. If I do, what consitutes a 'word'?
1+ underlined words [_multi]_word_ underlined text underline-tt (g)
What consitutes a word? Does this run afoul of multi_word_python_variable_names?
hypertextual 1+ word [multi_]word_ 1+ hot word(s) hot-tt (h)
This is weird. Where is the reference? Has this been implemented in gendoc?
followed by text >[space][text] > [mono-spaced] include-tt (i)
This looks like a quoted email message. But I guess it makes sense.
bullet-text in pos1 *[space][text] [bullet] [text] bullet-tt (j)
I think 'o text' and '- text' are more readable.
`_quoted typotag!_` `_left alone!_` quote-tt (k)
`_e_gads!_` I like 'this much better'
-------------------- ------------------- --------------- ------------ --- [hypertext link def] ^.. _word URL jump to address href-tt (l) [hypertext note def] ^.. _word Note:("*") ("cause error") note-tt (m)
I have no idea what this means.
-------------------- ------------------- --------------- ------------ --- end of first? setext $$ [last on a line] [parse another] twobuck-tt (n) ^..[space][not dot] [line hidden] suppress-tt (o) logical end of text ^..[alone on a line] [taken note of] twodot-tt (p)
Huh?
==================== =================== =============== ============ ===
Note: only one instance of the element (c) (or, in its absence, (b)) is absolutely _required_ for a text to be considered a valid setext.
All the elements but (c) are in effect optional, not necessary for a setext to be declared as such. Element (a) deals with setexts that arrive via email and end up being parsed (processed) as unedited mailbox files; fully employed the (a), (b) and (c) make it possible to distribute "multisetexts", i.e. setexts with one additional level of logical structure (= more than one setext per message; more than one message in a mailbox). If such file is viewed as a multisetext it will result in 3-level-outline structure: mail-subjects become top-level chapters, setext titles denote subchapters (topics) and the subheads yet finer threads within these (still a notch ABOVE mere "paragraphs of text").
$$ ----------------------------------------------------------------------- The following doc string example illustrates the usage of all setext constructs recognized by the gendoc tool. (i think)
class Setext(Text): """Lets you change markup to stylize your text
SETEXT 102 ==========
This is not valid setext. Setext wants the titles and headings to start in column 1 and the other text in column 3, like this: SETEXT 102 ========== **Setext** can be used to mark your text in a non-obtrusive manner. Text within double asterisks are treated as bold, ...
**Setext** can be used to mark your text in a non-obtrusive manner. Text within double asterisks are treated as bold, while single words with tilde at the front and back are rendered as ~Italic~. You can _underline_a_phrase_ but it will be rendered as bold in HTML. Placing hyperlinks is easy; just hilite_the_tag_ and at the bottom of the doc string include the address which it points to on a line by itself.
New paragraphs are separated by blank lines. > And a bunch of literal text > can be specified with the left > arrow. This gets marked as <pre> in HTML. Otherwise the text will be wrapped according to whatever output formatter is used.
A bulleted list is done with single asterisks thusly: * Lettuce * Onions * Pickles
Extension to setext -------------------
Ditto.
A frequent construct in python doc strings is to list ones keyword arguments. This made us wish for a way to specify a definition list so that it looks nice is html (and others). I propose the following. I have this working in my version. The double colons won't be in the output.
item1 :: definition 1 item2 :: definition 2 item3 :: a rather long and involved definition for item 3 spanning more than one line. item4 :: back to brevity with definition 4
Why not: item1 -- Definition 1 ... This looks much better to me, and works with StructuredText.
.. _hilite_the_tag http://www.python.org """
Notes:
The indenting inserted by python-mode for the entire doc string is detected and processed out before setext rules are applied. So eventhough titles for example are required to start in column one they will if they obey the overall indenting for that doc string.
Hm.
The underlines for the title and subtitle should be the same length as the title itself.
Spaces around tokens are important (for the "* ", "> ", and " :: ")
Comment are hearby welcome.
I think my structured text module mechanism provides richer text formatting with less obtrusive markup, especially for strings that have much structure, as many of mine do. Jim -- Jim Fulton Digital Creations jim@digicool.com 540.371.6909 ## Python is my favorite language ## ## http://www.python.org/ ## ================= DOC-SIG - SIG for the Python Documentation Project send messages to: doc-sig@python.org administrivia to: doc-sig-request@python.org =================
participants (2)
-
friedric@rose.rsoc.rockwell.com -
Jim Fulton