[Doc-SIG] docutils status report

Tony J Ibbs (Tibs) tony@lsl.co.uk
Thu, 30 Nov 2000 14:34:18 -0000

Well, the Tools/compiler hint was just what I needed - it took very
little time to start extracting docstrings from a Python file, and
it's fairly clear how one would manage a package.

Anyway, here is a status report and some "to do" information:

I currently have some code that will:

  1. Find the docstrings in a Python file (currently itself!)
  2. Split the text into paragraphs at blank lines.
  3. Identify lines within a paragraph that start like a list
     item, and split there as well - this allows::

         This is a paragraph.
         1. So is this
         fred -- and so is this

     to produce 3 paragraphs, and I think this is the main place that
     Python users want/need to be able to have "no blank lines".
  4. Identify subparagraphs of a paragraph ending "::" as literal
     text (note that, as in STNG, the "::" is enough text to constitute
     "text in a paragraph", so the "empty header" idea will work).
  5. Identify paragraphs starting ">>>" (allowing leading whitespace)
     as Python code (i.e., literal)
  6. Recognise bullet list items (as in ST - the use of "o" may go away
     following David's comments)
  7. Recognise numbered list items (but the final dot *is* required,
     otherwise 3 above will fail on::

        My favourite drink is tea, but also
        I like coffee

     (thinking the second line to have a Roman numeral at the start!)).
     Note that one won't be able to do::

        And the final number is

     without a spurious list, but I reckon we can live with that!
     (We have to pay for apparent simplicity with true complexity.)
  8. Recognise descriptive list items (note that markup is allowed in
     the "title" of the item, so one can do::

        ' -- ' -- This is an awkward case

  9. Recognise *emphasised* text, **strong** text and 'literal' text
     (but, except by "accident" nesting of markup does not work (well,
     you can't *nest* markup in literal, since it won't be seen!)).
     The emphasised and strong texts may contain any characters (except
     the terminating sequence, of course), and inline literals may
     contain anything but "'". Escaping characters is not yet addressed.
 10. Emit a "dump" of the datastructure that is built up.

Note that the markup done is handled in a way that should be simple to
customise - I may well provide the final product as an example of this,
with an "STNG-like" core and the pyST extras as "extension".

I'd prefer not to make code public until I've got stuff DOM structured
(see below), but if anyone *really* wants to, I can make it available
for download.

Things to do next
(in no particular order) include:

 A. Move the datastructure to a DOM model (probably based on
    Python 2.0's mindom.py). DOM looks sensible because it is
    (erm) fashionable, and if I use minidom then I get XML
    output for free.

 B. Add more markup (I've got a bare minimum for testing at the moment)

 C. Define what the command line interface is (i.e., how to specify that
    one wants to parse a file or package, what one wants the output to
    be, and so on.)

 D. Document what it does, so that David and company can haggle over the
    exact syntax supported. This obviously includes making sure the
    whole thing has nice (correct) docstrings throughout.

 E. Make nested markup work, so one can do::

        *This is **strong and 'literal'** text within emphasised*

I do *not* intend to provide support for tables! (that can come later).

Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
Give a pedant an inch and they'll take 25.4mm
(once they've established you're talking a post-1959 inch, of course)
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)