[Doc-SIG] What docs should be in the source file?

Edward Welbourne Edward Welbourne <eddy@chaos.org.uk>
Thu, 22 Mar 2001 20:36:19 +0000 (GMT)

> What i had in mind was "how to use this module".
I'll come back to this: it points to a middle ground ...

> As an extreme example, try running "perldoc CGI".  CGI.pm contains
> about 3200 lines of code followed by 3000 lines of detailed
> documentation.  While the module itself is indeed enormous, i think
> that it is useful to have all of that information about how to use the
> CGI module instantly available right there in CGI.pm.

having played enough with pod to see that it would be a good tool for
the kinds of in-code doc I discuss below, *but no more*, I would argue
that pod is actually the perfect illustration of why I *don't* want to
go this far (even allowing for CGI.pm to be chopped into little pieces).
The docs one gets out of pod are just about OK for showing to techies
who only really care about information content and don't mind *too much*
if it's rather badly presented - as various remarks in the perlpod man
page will make clear, this is intended.  Using it for anything more
leads to ugly docs (I shalln't be quick to forget the look of *disgust*
on the face of a technical author colleague when I asked, yesterday,
what to do about exactly some such docs ... I'm glad pod was the focus
of that, else I'd have curled up and died).

> ... The question is "how far is the user of a module from *some*
> information on how to use the module?"  Doesn't matter if they don't
> have every article that anyone has ever written about the module -- do
> they have a starting point?

Edward & Tibs have clearly been devoting *much* effort to consideration
of how to embed xrefs in the doc strings (and yes, <...> is the morally
correct way to delimit URLs; but I'll come back to that in a separate
e-mail), so - at least in principle - the doc string contains xrefs to
all the flavours of doc that might exist in connection with the module.
Even if it only documents the naked API, its xrefs are a starting point.

> It's also harder for me to change foo() to spam() in just the code,
> check in just that part, and say "oh, i'll change the docs later" --
> because i'll be checking in a single file that's inconsistent with
> itself.

> If the docs are in a different file, I can do a CVS diff to see what's
> changed in the code since the last time I updated the docs, and thus
> can do updates to the documentation "in batch."

and, when I know several of my colleagues will be changing the same file
some more in the present release cycle, batching the doc changes may
well be The Right Thing To Do - especially if, as where I work, there's
a separate documentation group ... and I trust their idea of what
constitutes an intelligible presentation of `how to use this tool'
better than I trust most techies, myself included.

So TRTTD may well be to send an e-mail to the doc group saying `I
changed module foo in this way, I *have* revised the API docs within it,
I think you need to change sections 2, 7 and 11 of the refman, along
with all references, in all docs, to method fudge() on class
Interpolator', rather than messing up their docs (which are likely
maintained in some other doc format anyway, precisely because real doc
teams don't believe in the sorts of doc-tool that techies think of as
the bee's knees).  Furthermore, the changes to documentation, even if I
draft them before the doc team goes to work, will probably need to be
integrated with several other sets of changes made by colleagues whose
projects impact the same source file in the same release cycle.

I may need to check in my code changes to get the automatic test tools
to run the right tests on all platforms (as opposed to the one or two on
which I test it myself before freezing) and I may well be checking in a
prototype or first draft of my changes in order to find out which
irritating platform-variations are going to force me to revise my
approach before I can settle the final issues of the design that goes
into the release candidate; so it may not be `laziness' that I leave out
my changes to the docs - it may be the prudence of `I shall almost
certainly be changing this some more, and shall not know for sure how
until later' which makes large amounts of efforts on the docs futile.

So `oh leave the docs for now' may actually be wise and prudent; and, as
Edward says, I (or our doc team) can ask tools where changes to docs are
needed.  Indeed, in an ideal world, the doc team has taken the design
spec I wrote before I began coding and is working on the user-oriented
docs at the same time that I'm changing the code.  It is generally best,
under *any* version-control system, to avoid having two sets of changes
proceeding on the same file at the same time; and I'll be reviewing the
doc team's work while the doc team review my changes to the API docs, so
we do get to catch glitches.

Now, back to Ping's
> What i had in mind was "how to use this module".
and here I'm with Ping, regarding Edward's `Only the API' line as being
too purist - or confusingly phrased; I shalln't be surprised if Edward's
idea of `Only the API' does include `how to use this API', so I suspect
we aren't as far apart as we seem to imagine.

So I have half a guess that the following might bring us closer to

  The python source code contains doc strings which explain how to use
  the code; this is expressed in ST and targeted at maintainers and
  interrogators - i.e. folk who are either looking at the source or
  playing with an object their python session has given them, whose
  behaviour they need to know about, ideally without being obliged to
  look at the implementation (even assuming they have it).

  Other files contain documentation of other kinds, possibly in other
  formats; project management and version control can be used to flag
  which of these will need to be changed when the code changes.  The
  source docs cross-reference these.

The source docs *should* suffice to generate (possibly crude)
documentation in (at least) man and HTML formats, which should be of a
good enough standard to serve as the *start-point* for writing the
reference manual; indeed, if one isn't too fussed about the reference
manual being beautiful, they should suffice *as* a reference manual.

The source doc format *must* be sufficiently straightforward that

  a maintainer looking at the code *will* read and understand them
  without suffering eye-pain (on which HTML fails for Guido at least)

  a maintainer changing the code *will* be able to see what changes to
  make to the docs and *will not* be put off making those changes by
  doubts about how to express them

  an interrogator with a python object `in hand' can (chose their own
  interrogation tools and, using these) get the object to tell all they
  need to know to determine what it promises to do (and what it doesn't)

  the author of potential client code can ask tools to find them which
  source modules to consider using and can glean enough information from
  the docs of those modules to make informed (ideally: correct) choices.

The maintainer's needs call for simplicity of format, the interrogator's
call for richness, albeit with some cross-over both ways; good tools can
make a big difference to the richness (e.g. all that stuff about
trawling base classes for matching methods, providing default doc
strings, etc.).  The client-author's needs call for standardisation
(hence Tibs' work on labels).

Practical experience in the field of software maintenance says
unambiguously that simplicity is a very serious issue, especially if one
is to have enough standardised semantic markup to ensure that tools can
do a good job for the client author.  A surfeit of bureaucracy *will*
lead to folk changing the code without bothering to keep the source docs
in sync (let alone the out-of-source ones).  Equally, without suitable
standardised markup, client-authors will be unable to find a good
supplier of round wheels, so they *will* end up using hexagonal ones
`because those are easier to knock together', which will continue to
make a mess of the roads.  Case in point: regexen for URLs.

To meet these needs, the source docs for each method/class/... (call it:
object) *do* need to include:
  * a clear statement of what the object *promises* (and doesn't)
  * a clear statement of *what it's for* and *how to use it*
  * references to more sophisticated docs saying everything else
for as many values of `everything else' as authors can be found to
write.  If we ask for more than this, 
  * we'll need such complexity in ST that maintainers won't, so
  * it won't be realistic to expect the in-code docs to stay in sync
    with the code they're in, so
  * we won't have the code separate from the docs that will get out of
    sync with it, so no-one will know which is right.

Note that disagreement between code and some docs won't trigger the
`trust neither' rule provided
  * it's immediately clear to the reader which one (the one in a
    different file from the implementation) is wrong, and
  * there are *some* docs with the code which agree with it.

>     - Keeping modules and associated docs in the same file helps
>       to ensure that the two are in sync when you distribute or
>       edit the file.  (It's not possible to have different
>       versions of the code and the docs at the same time; it's
>       less likely that someone will check in changes to one
>       without updating the other, etc.)

> 2 issues: editing and distribution
>   distribution -- maybe we want to turn modules into packages, and
>      include docs in the package?  There's not a lot of precedent
>      for this in other languages though..
>   editing -- ...
(I already addressed editing)

The distribution problem defines the boundary quite nicely: reference
manuals, how-to guides and tutorials *shouldn't* change when I fix a
bug, though the in-code docs might (notably for the internal method
which now has to do things slightly differently so that the module
actually implements its documented external API).  Likewise if I totally
re-implement the entire module, but preserve its API; conversely, a
perfectly good module may get its reference manual massively overhauled
without changing one line of the code.

Of course, a real total re-implementation will change the API, but then
it'll equally be part of a `new major release' of the module, so
re-writing the separate docs shouldn't seem out of place.  (Indeed, a
total re-write of the module reference manual will typically reveal
changes needed in the API.)

Furthermore, if you've got the code you need the API and an overview of
how to use the module; these need to be in sync with the actual
implementation you've got (and to tell you which *version* you've got).
However, you'll probably only use a moderate fraction of the actual
modules in your python installation, so you probably *don't* want a
separate copy of the tutorial and similar `big picture' docs on every
machine on which you install your python distribution; you may be happy
to live with the xrefs pointing to www.python.org or you may want to
have one copy of all the big picture docs on a central server shared by
all pythoneers in a given team.

[Which points to an issue for the URL discussion; one really does want
to be able to specify URLs relative to `the root URL we selected when we
installed our python doc system' which *might* be at www.python.org and
*might* be on a machine on the team's local network or *might* be local
to the actual machine in use; the installation process will doubtless
involve verifying that this URL is accessible and *does* provide the
relevant docs.]

When it comes to the reference manual, there is even a case for
deliberately chosing to isolate it from the source - so that, for
instance, I can implement a module which will be portable between
versions of python.  If, in it, I rely on the in-code docs of my
locally-installed re module (say) I may well write code which only works
for folk using the same version of python as me.  The reference manual
for module re *should* tell me gotchas about `we changed this between
version 1.5.2 and 2.0 of python, so beware' which (IMO) *should not* be
present in the in-code docs of the module.

So I find myself increasingly confident that TRTTD is to draw a dotted
line between the things which belong with the code and the things which
do not; that all `big picture' docs belong in separate files; that
authors of client code really *do* need to use these `big picture' docs
as their primary source (for portability); and that the in-code docs
should be limited to the API and an account of its proper usage.  The
big picture docs then get to be revised when the API changes, or when
someone finds the energy to improve them, as a separate process from any
changes to the code.