[Doc-SIG] Documentation tool

Edward Welbourne Edward Welbourne <eddyw@lsl.co.uk>
Tue, 23 Jun 1998 21:05:36 +0100


As someone fare more eloquent once said ... sorry this is so long, but I
don't have time to shorten it.

The really unstoppable reason I prefer import-based handling of doc
strings is that *it means there is only one parser for python code*.
The significance of that to me is like a religious thing, so I'll
remember to tolerate other views but *I know it's right* ;^>

Recent discussion has had a lot of stuff about comments, docstrings and
generating reference manuals.  I have strong opinions, I hope they can
be made useful.

There are three kinds of `documentation' I want associated with a tool:

  comments -- which I read when I'm modifying the tool

  manuals -- which I read before trying to write code which uses the tool

  docstrings -- which I read when I find the tool in my hands

The first is like a bar-code on goods in a shop - the shop needs it
there, but the customer doesn't look at it; the second is like the
information on the packaging, the shop has to provide it and the
customer is only going to read it once; the last is the few simple
things etched on the article itself - the things that will remind me of
all the important bits when I'm using the goods.

When I want to know which tool to use, I consult the manual: and I'm
happy enough to have an autogenerated rendition of the docstrings in the
manual (as an appendix), but the thing I want the manual for is to guide
me from `I know what I want to do' to `here are the tools to do it'.

When I'm running my code in a debugger and it's carrying around objects
from someone else's package, I probably don't know much about how those
objects behave.  If they're not doing what I expect of them, I need to
be able to ask them about themselves.  In python, I can ask for
information about which package provided them, so I could RTF sources
and work out what's going on: but one hacker's beautiful code is
another's spaghetti forgetti.  If I can ask the object itself to tell me
about itself, I'm much happier - if only because I can then tell the
difference between a bug in the tool I'm using and a gap between my
expectations and the tools's designer's intent.

I can't fix (or enhance) the tool without making sense of its source:
and that's when I need the comments.  They tell me why the code does
things the way it does, they can contain warnings about silly mistakes I
might introduce if I try to do the obvious, and they're where I'm going
to leave the information that it was me who changed it, and why.


Consequently,

  comments belong within the source code and should be (ignored and)
  thrown away by the parser - which is what most sensible programming
  languages do, albeit with some bodges here and there. 

  manuals belong without the source code; they should have sophisticated
  indexes, glossaries, etc.; they even have a right to worry about being
  pretty; the manual is not part of the program, though it is part of
  the installation.

  docstrings need to be attached to objects when they're running, so
  they have to have a presence in the source code (though they may
  legitimately be *constructed* at run-time) and be recognised by the
  language.

and I don't want any of them intruding on the others' territory:
docstring information shouldn't be taking up comment space (though I
accept that tagging arguments with `type information' in comments would
be a lot better than an incompatible change in the parser to fit that
information into the code) and documentation derived from docstrings
should be kept to the appendices of the manual.  [If you don't sit down
and write a manual, you won't have a good manual.]  In particular, the
parser shouldn't tamper with the value of __doc__, least of all for the
sake of a few comments (which it might have misunderstood).

The parser does already extract some information (prototype and `where
is the source for this', at least) and store it in namespaces.  It could
do some similar things with `comments as type information', but would
have to mistrust this (essentially because we don't want the parser
throwing out code just because its comments are wrong).  It could get
better information, of this kind, from assertions in the code: which are
an extant python feature capable of much richer information than mere
type and default - eg stipulating relationships among the arguments.

Tools can be built (indeed, I think the gendoc folk have done so) for
presenting such `gleaned' information in a helpful form at runtime,
possibly combined with __doc__ strings.  (Another possibly valuable
ingredient, by the way, is information extracted from assertions at the
start of a function -- I'd trust these a lot better than the comments.)
I regard the combination as docstrings.

The reason I don't have much time for comments contributing to
docstrings (aside from the fact that this isn't what they're for and it
would intrude in what they are for) is that, if we can agree the
semantics of the comments, we could surely as readily agree the
necessary semantics for the structured text format used in __doc__
strings, which are the proper home of this information.

That said, I confess the case of `type-tags' on arguments forms a pretty
good case for one exception - because they evade repetition of the
argument names.

If we do tie such documentation to the argument, how am I going to
access it ?  The arguments of a function aren't accessible from any
other namespace ... and what *object* would be carrying this `argdoc
string' ?

As partial answer to that, we could have the function carry around a
dictionary of interface docstrings, func.__ifdocs__ say, whose keys are
argument names, values are argdoc strings (or possibly some kind of
datastructure packaging those along with default, argument number, ...);
special key '' means the function's return value.


Anyway, here's a suggestion about specifying arguments in __doc__ -
since I don't like docstring info in comments, I have to at least offer
a suggestion:

in the structured text format (STF hereafter, URL is
http://www.cwi.nl/www.python.org/sigs/doc-sig/) there's a convention
that any `paragraph' (as there defined) ending in the word `example', in
any case, with optional trailing s and punctuators, introduces some text
which should be presented `as is' (subject to the usual rules for
rationalising leading whitespace on lines);

likewise, we could have any paragraph ending in `argument' have a
similar behaviour, only it's expecting to introduce a descriptive list
(so the magic might only happen if that's what does follow).  We almost
certainly *do* need some sensible markup rules (not yet in the STF last
time I checked) for delimiting the details of an argument: but the
basics of a descriptive list should do us pretty well.


To my mind, comments are directed at the code's maintainer *and no-one
else*, whereas the docstring is the interface specification - the
contract between the code's maintainer and the code's user.  I don't
care how convenient blurring that is for who, that separation is
Important:

  as the author of code calling yours, I want to be able to look at the
  doc string and know what I'm allowed to assume about what your tools
  will do for me; that's the bit that I'm trusting you to continue
  honouring from this version into the next (and that's where I expect
  you to tell me that you intend to retire one of the arguments at a
  later version);

  again as that author, I don't want to have to RTF source and work out
  which bits are directed at me, which at the maintainer's colleagues

  as the maintainer of my code, I want folk to call it assuming that it
  does everything it claims to do, and assuming nothing else.  Boring
  facts of experience say that: folk who had to flick through the code
  of the implementation come away thinking they know better than the
  interface documents.  They then *don't* call it with some arguments it
  should accept (because they've discovered it doesn't cope), so I don't
  get bug reports that I need; and they *do* call it with arguments it
  never promissed to take, so I get to field false bug reports when I
  come to make changes within the interface spec I published.

I want to keep comments and doc-strings separate.

By the by: what were the eventual conclusions of all that discussion
about how to specify cross-references in docstrings ?  Is there a
succinct statement of it somewhere ?

	Eddy.