[Doc-SIG] Evolution of library documentation

Ka-Ping Yee ping@lfw.org
Sun, 11 Mar 2001 02:13:38 -0800 (PST)

[resent with individual cc addresses, since mail.python.org is down]

Hi everyone!

The introduction of pydoc places more emphasis on docstrings in the
source code.  I think this is generally good, since keeping the
documentation close to the source makes it more likely to be kept
up to date.  However, it also produces the potential for duplication
of effort in maintaining both the docstrings and the LaTeX file for
the library reference.

The LaTeX documentation seems to be motivated by the richer metadata,
the greater control over formatting, and the ability to present a
long tutorial or detailed explanation.

At the Python conference, a small group of us discussed the possibility
of merging the external and internal documentation; that is, moving
the library reference into the module source files.  It would no longer
be written in TeX so that you wouldn't have to have TeX in order to
produce documentation.  This would address the duplication problem and
also keep all of a module's documentation in one place together with
the module.  To avoid forcing you to page through a huge docstring
before getting to the source code, we would allow a long docstring to
go at the end of the file (or maybe collect docstrings from anywhere
in the file).

To implement this convention, we wouldn't need to change the core
because the compiler already throws out string constants if they aren't
used for anything.  So a big docstring at the end of the file would not
appear in the .pyc or occupy any memory on import; it would only be
obtainable from the parse tree, and tools like pydoc could use the
compiler module to do that.

That leaves the metadata and formatting issues.  When i suggested this
idea (of merging in the external documentation) to Guido, he was
initially against it.  He was very concerned about the loss of information
in the TeX markup.  In order to even consider switching formats, he
requires that we preserve as much metadata as possible from the TeX
docs (so that, for example, we can still generate a useful index).

But i still think that getting all the docs together in one place is
a goal worth at least investigating.  So i have gone through the TeX
files in the Doc/lib directory and extracted a list of all the TeX
markup tags that are used there.  Here follows my list; i have attempted
to categorize the purpose of the tags by hand.

Fred, would you mind looking over this list to see if i have classified
the meanings of the tags correctly?

Each tag name appears with the number of times that it occurs as a
measure of how important it is.  This should give us a starting point for
evaluating and discussing what kind of metadata and formatting control we
have, what is worth preserving, and what we would need to consider
supporting in a structured-text-style markup if we were to merge the

After i've had a while to study the list, i will probably post my own
annotated list of which ones i would support and which ones i would
toss.  I encourage you to look at it and do the same.

-- ?!ng

"If I have not seen as far as others, it is because giants were standing
on my shoulders."
    -- Hal Abelson

# ------------------------------------------------------------- BLOCK TAGS

block formatting markup:
    abstract 1
    description 28
    displaymath 1
    document 1
    enumerate 6
    flushleft 1
    fulllineitems 1
    itemize 35
    list 2
    seealso 73
    sloppypar 3
    verbatim 274
    math 4

table formatting:
    longtableii 2
    tableii 34
    tableiii 24
    tableiv 1

descriptive sections for Python objects:
    classdesc 132
    datadesc 399
    datadescni 29
    excclassdesc 4
    excdesc 124
    funcdesc 1122
    funcdescni 1
    memberdesc 170
    methoddesc 1152
    methoddescni 4
    opcodedesc 104

# ------------------------------------------------------------ INLINE TAGS

special words, symbols, and math:
    ABC 3
    ASCII 58
    C 12
    Cpp 2
    EOF 19
    Large 2
    NULL 3
    POSIX 30
    UNIX 226
    copyright 1
    e 3
    frac 1
    ldots 2
    sqrt 1
    sum 1

inline formatting markup:
    cdata 11
    cfunction 84
    character 163
    code 2485
    ctype 40
    dfn 63
    email 3
    emph 163
    envvar 47
    file 174
    footnote 24
    kbd 14
    keyword 98
    longprogramopt 7
    manpage 23
    mbox 1
    mimetype 14
    platform 44
    program 65
    programopt 9
    regexp 63
    rfc 43
    samp 347
    strong 85
    textrm 9
    url 21
    var 4234

metadata fields:
    declaremodule 211
    deprecated 15
    moduleauthor 57
    modulesynopsis 220
    sectionauthor 114
    versionadded 82

TeX processing macros:
    documentclass 1
    input 226
    label 242
    nodename 20
    renewcommand 3

table cells:
    lineii 386
    lineiii 279
    lineiv 15

tagging indexable words:
    bifuncindex 35
    index 180
    indexii 92
    indexiii 17
    indexiv 1
    obindex 26
    opindex 12
    setindexsubitem 13
    stindex 12
    stmodindex 1
    ttindex 50
    withsubitem 28

    citetitle 11
    ref 29
    refbimodindex 31
    refmodindex 2
    refmodule 203
    refstmodindex 60
    seemodule 84
    seepep 1
    seerfc 9
    seetext 12
    seetitle 1
    seeurl 3

Python identifiers:
    class 639
    constant 348
    dataline 67
    exception 310
    funcline 61
    funclineni 1
    function 954
    member 159
    memberline 2
    method 866
    module 635
    pytype 1
    optional 734

    chapter 25
    section 237
    subsection 227
    subsubsection 44
    title 1