[Doc-SIG] Approaches to structuring module documentation

Fred L. Drake, Jr. fdrake@acm.org
Fri, 12 Nov 1999 16:01:25 -0500 (EST)


--FBmNV/Tzqn
Content-Type: text/plain; charset=us-ascii
Content-Description: message body text
Content-Transfer-Encoding: 7bit


Manuel Gutierrez Algaba writes:
 > Is this the LaTeX one ? or the "traditional" XML ?

  I would describe the current approach as document-centric.
"Document-oriented" is how I was referring to content which was
naturally organized in documents, as opposed to data-structure-like
constructions such as my sample module reference.
  The actual syntax wasn't specific to any of the three definitions.

 > >   DOCUMENT-CENTRIC APPROACH: The human-read document is the primary
 > 
 > Is this TeEncontreX'es ? Are "module reference material" the 
 > "\indexpython" things ?

  No, by this I meant the entire section documenting the module.

 >   MICRODOCUMENT APPROACH:  Multiple DTDs are used to encode
 > document-level information and module reference material.  Let's only
 > 
 > What's this ?

  I'm not sure what "this" refers to; the term "microdocument
approach"?  I'll be more specific:
  Using a microdocument approach would involve using at least 2 DTDs,
one for module references, and another for "everything else."  Each
module reference would be a document instance all by itself (in the
SGML/XML sense), not just a file that's part of something larger (like 
the current module sections; there's no meaningful way to process them
individually.  To get something like the current Library Reference,
another document (with another DTD) would specify how to put it
together: put this module, then this one, and now that section of
prose; in the next chapter, put ....  We could define separate DTDs to 
document Python modules, C APIs, and more book- or article-like
sections.   Another would be the "glue" that defines a "manual" or
"howto" document.

 > <description of things very related to TeEncotreX, I think>

  From your explanations and looking at TeEncotreX, I'd describe what
you're doing as "indexing": you're assigning terminology from a
controlled vocabulary to each entry in your document base, and using
that as a retrieval mechanism.  I think this is orthagonal to what I'm 
talking about.  Regardless of a move toward a microdocument approach
or document-centric approach, good indexing is critical to make the
information accessible.
  The way you're using it (with lots of small articles) makes it very
microdocument-flavored, aside from lumping all the documents in one
file.

 > To put it short: "Lot of work coding _details_". Just a comment,
 > python is **much** better than C++, for example, because you
 > have  no need to declare every type, every detail, even, you can
 > have large parts of a python programm broken, parts that a C++
 > compiler would mark as erroneous. 

  I agree.  I think things like type annotations should be completely
optional in the documentation.  However, I think there's a lot of
value in supporting annotations that say things like "this returns a
file-like object" that can be interpreted by programmer's tools (help
system in an IDE, pylint-style analyzers, etc.).  So it should be
possible to add interesting annotations, so a programmer can ask a
tool, "What are all the ways I can get a file object?"

 > > To really make it work, a lot of attention
 > > would have to be applied to the result of the first-stage conversion
 > > to check the accuracy of the results, make the various bits of text
 > > actually land in the right place (since everything is pretty much
 > > thrown together now), and encode a lot of additional information about 
 > > types, parameters, exceptions thrown, etc. 
 > 
 > More heavy work !

  But, as you point out for TeEncontreX, it's linear to the volume of
information you have + what you want to get out of it.

 > The biggest problem I see here is that you get a very good documentation
 > ( due to the huge ammount of work) or you get nothing ( the author
 > doesn't documentate).

  We get the later one now!  ;(

 > It'd be wise to provide several levels of marking-up , so people
 > can mark-up little by little, some important things first and so...

  This is another good reason to make a lot of the markup optional; my 
example probably did use "maximal" markup, but went a long way toward
it.  Let's try adjusting the assumed DTD a little, and cut out a fair
bit of the markup (even if it's useful).  The file is attached; here's 
the word count:

weyr(.../Doc/lib); wc libmailbox.tex mailbox.xml mailbox-min.xml 
      53     251    1938 libmailbox.tex
     159     504    5364 mailbox.xml
     118     370    3936 mailbox-min.xml

  Still large, but definately better.  Good enough?  I don't know.
  I do expect that at least one tool will emerge that will take a
Python source file and spit out a skeleton documentation file that can 
be filled in.

 > This is the "TeEncontreX" version of Mailbox, this should
 > work if you have AnalizaToo.py:

  Cool; I'll run this through as soon as your package downloads again!
;-)
  Aha!  You didn't test this!  ;-)

 > Just some comments:
 > - Thinking about it, I mentioned the need for an appropos utility
 > one year ago, If you realise, this IS the apropos utility!!

  Library science types would call this kind of data marking
"indexing". 
  Saludos, amigo!   (Hey, I'm learning Spanish!  Cool! ;)


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


--FBmNV/Tzqn
Content-Type: text/xml; charset=iso-8859-1
Content-Description: More minimal sample module reference.
Content-Disposition: inline;
	filename="mailbox-min.xml"
Content-Transfer-Encoding: 7bit

<?xml version="1.0" encoding="iso-8859-1"?>
<module-reference>
  <module-info>
    <module>mailbox</module>
    <synopsis>Read various mailbox formats.</synopsis>
    </module-info>

  <overview>
    <para>This module defines a number of classes that allow easy and
      uniform access to mail messages in a mailbox.  Most of the
      supported mailbox formats come from the Unix world.</para>

    <para>None of the classes defined in this module lock the
      mailboxes that are accessed; this needs to be handled by
      application code.</para>
    </overview>

  <protocoldesc>
    <protocol>Mailbox</protocol>
    <method name="next">
      <return-value>
        A message object, or <constant>None</constant> if there
        aren't any more message in the mailbox.
        </return-value>
      </method>
    </protocoldesc>

  <classdesc>
    <class>UnixMailbox</class>
    <protocol>Mailbox</protocol>
    <description>
      Access a classic Unix-style mailbox, where all messages are
      contained in a single file and separated by <quote>From name
        time</quote> lines.
      </description>
    <constructor>
      <parameter name="fp" protocol="file"/>
      <description>
        <para>Initialize the mailbox object and point to the first
          message in the mailbox.</para>
        </description>
      </constructor>
    </classdesc>

  <classdesc>
    <class>MmdfMailbox</class>
    <protocol>Mailbox</protocol>
    <description>
      <para>Access an <acronym>MMDF</acronym>-style mailbox, where all
        messages are contained in a single file and separated by lines
        consisting of four control-A characters.</para>
      </description>
    <constructor>
      <parameter name="fp" protocol="file"/>
      <description>
        <para>Initialize the mailbox object and point to the first
          message in the mailbox.</para>
        </description>
      </constructor>
    </classdesc>

  <classdesc>
    <class>MHMailbox</class>
    <protocol>Mailbox</protocol>
    <description>
      <para>Access an <acronym>MH</acronym> mailbox, a directory with
        each message in a separate file with a numeric name.  Messages
        that are added to the mailbox after the instance is created
        are not accessible; a new instance is needed to access newly
        added messages.</para>
      </description>
    <constructor>
      <parameter name="dirname" type="string"/>
      <description>
        <para>Initialize the list of messages that can be loaded from
          the mailbox.</para>
        </description>
      </constructor>
    </classdesc>

  <classdesc>
    <class>Maildir</class>
    <protocol>Mailbox</protocol>
    <description>
      <para>Access a Qmail mail directory.  All new and current mail
        for the mailbox is made available.  Messages that are added to
        the mailbox after the instance is created are not accessible;
        a new instance is needed to access newly added messages.
        </para>
      </description>
    <constructor>
      <parameter name="dirname" type="string"/>
      <description>
        <para>The <param>dirname</param> parameter points to the
          mailbox directory.</para>
        </description>
      </constructor>
    </classdesc>

  <classdesc>
    <class>BabylMailbox</class>
    <protocol>Mailbox</protocol>
    <description>
      <para>Access a Babyl mailbox, which is similar to an
        <acronym>MMDF</acronym> mailbox.  Mail messages start with a
        line containing only <literal>'*** EOOH ***'</literal> and end 
        with a line containing only <literal>'\037\014'</literal>.
        </para>
      </description>
    <constructor>
      <parameter name="fp" protocol="file"/>
      <description>
        <para>Initialize the mailbox object and point to the first
          message in the mailbox.</para>
        </description>
      </constructor>
    </classdesc>
</module-reference>

--FBmNV/Tzqn--