[Doc-SIG] docutils status report
Tony J Ibbs (Tibs)
tony@lsl.co.uk
Thu, 30 Nov 2000 14:34:18 -0000
Well, the Tools/compiler hint was just what I needed - it took very
little time to start extracting docstrings from a Python file, and
it's fairly clear how one would manage a package.
Anyway, here is a status report and some "to do" information:
I currently have some code that will:
1. Find the docstrings in a Python file (currently itself!)
2. Split the text into paragraphs at blank lines.
3. Identify lines within a paragraph that start like a list
item, and split there as well - this allows::
This is a paragraph.
1. So is this
fred -- and so is this
to produce 3 paragraphs, and I think this is the main place that
Python users want/need to be able to have "no blank lines".
4. Identify subparagraphs of a paragraph ending "::" as literal
text (note that, as in STNG, the "::" is enough text to constitute
"text in a paragraph", so the "empty header" idea will work).
5. Identify paragraphs starting ">>>" (allowing leading whitespace)
as Python code (i.e., literal)
6. Recognise bullet list items (as in ST - the use of "o" may go away
following David's comments)
7. Recognise numbered list items (but the final dot *is* required,
otherwise 3 above will fail on::
My favourite drink is tea, but also
I like coffee
(thinking the second line to have a Roman numeral at the start!)).
Note that one won't be able to do::
And the final number is
1.
without a spurious list, but I reckon we can live with that!
(We have to pay for apparent simplicity with true complexity.)
8. Recognise descriptive list items (note that markup is allowed in
the "title" of the item, so one can do::
' -- ' -- This is an awkward case
9. Recognise *emphasised* text, **strong** text and 'literal' text
(but, except by "accident" nesting of markup does not work (well,
you can't *nest* markup in literal, since it won't be seen!)).
The emphasised and strong texts may contain any characters (except
the terminating sequence, of course), and inline literals may
contain anything but "'". Escaping characters is not yet addressed.
10. Emit a "dump" of the datastructure that is built up.
Note that the markup done is handled in a way that should be simple to
customise - I may well provide the final product as an example of this,
with an "STNG-like" core and the pyST extras as "extension".
I'd prefer not to make code public until I've got stuff DOM structured
(see below), but if anyone *really* wants to, I can make it available
for download.
Things to do next
-----------------
(in no particular order) include:
A. Move the datastructure to a DOM model (probably based on
Python 2.0's mindom.py). DOM looks sensible because it is
(erm) fashionable, and if I use minidom then I get XML
output for free.
B. Add more markup (I've got a bare minimum for testing at the moment)
C. Define what the command line interface is (i.e., how to specify that
one wants to parse a file or package, what one wants the output to
be, and so on.)
D. Document what it does, so that David and company can haggle over the
exact syntax supported. This obviously includes making sure the
whole thing has nice (correct) docstrings throughout.
E. Make nested markup work, so one can do::
*This is **strong and 'literal'** text within emphasised*
I do *not* intend to provide support for tables! (that can come later).
Tibs
--
Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/
Give a pedant an inch and they'll take 25.4mm
(once they've established you're talking a post-1959 inch, of course)
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)