[Python-Dev] Python Doc problems

Talin talin at acm.org
Sun Oct 8 00:10:58 CEST 2006


skip at pobox.com wrote:
>     Andrew> In such autogenerated documentation, you wind up with a list of
>     Andrew> every single class and function, and both trivial and important
>     Andrew> classes are given exactly the same emphasis.  
> 
> I find this true where I work as well.  Doxygen is used as a documentation
> generation tool for our C++ class libraries.  Too many people use that as a
> crutch to often avoid writing documentation altogether.  It's worse in many
> ways than tools like epydoc, because you don't need to write any docstrings
> (or specially formatted comments) to generate reams and reams of virtual
> paper.  This sort of documentation is all but useless for a Python
> programmer like myself.  I don't really need to know the five syntactic
> constructor variants.  I need to know how to use the classes which have been
> exposed to me.

As someone who has submitted patches to Doxygen (and actually had them 
accepted), I have to say that I agree as well. At my work, it used to be 
standard practice for each project to have a web site of "documentation" 
that was generated by Doxygen. Part of the reason for my patches (which 
added support for parsing of C# doctags) was in support of this effort.

However, I gradually realized that there's no actual use-case for 
Doxygen-generated docs in our environment.

Think about the work cycle of a typical C++ programmer. Generally when 
you need to look up something in the docs for a module, you either need 
specific information on the type of a variable or params of a function, 
or you need "overview" docs that explain the general theory of the module.

Bear in mind also that the typical C++ programmer is working inside of 
an IDE or other smart editor. Most such editors have a simple 
one-keystroke method of navigating from a symbol to its definition.

In other words, it is *far* easier for a programmer to jump directly to 
the actual declaration in a header file - and its accompanying 
documentation comments - than it is to switch over to a web browser, 
navigate to the documentation site, type in the name of the symbol, hit 
search...why would I *ever* use HTML reference documentation when I can 
just look at the source, which is much easier to get to? Especially 
since the source often tells me much more than the docs would.

The only reason for generated reference docs is when you are working on 
a module where you don't have the source code - which, even in a 
proprietary environment, is something to be avoided whenever possible. 
(The source may not be 'open', but that doesn't mean that *you* can't 
have access to it.) If you have the source - and a good indexing system 
in your IDE - there's really no need for Doxygen.

Of course, the web-based docs are useful when you need an overview - but 
Doxygen doesn't give you that. As a result, I have been trying to get 
people to stop using Doxygen as a "crutch" as you say - in other words, 
if a team has the responsibility to write docs for their code, they 
can't just run Doxygen over the source and call it done.

(Too bad there's no way to automatically generate the overview! :)

While I am in rant mode (sorry), I also want to mention that most 
Documentation markup systems also have a source readability cost - i.e 
having embedded tags like @param make the original source less readable; 
and given what I said above about the source being the primary reference 
doc, it doesn't make sense to clutter up the code with funny @#$ characters.

If I was going to use any markup system in the future, the first thing I 
would insist is that the markup be "invisible" - in other words, the 
markup should look just like normal comments, and the markup scanner 
should be smart enough to pick out the structure without needing a lot 
of hand-holding. For example:

    /*
       Plot a point at position x, y.
       'x' - The x-coordinate.
       'y' - The y-coordinate.
    */
    void Plot( int x, int y );

The scanner should note that: 'x' and 'y' are in single-quotes, so they 
probably refer to code identifiers. The scanner can see that they are 
both parameters to the function, so there's no need to tell it that 'x' 
is an @param.

In other words, the programmer should never have to type anything that 
can be deduced from looking at the code itself. And the reader shouldn't 
have to read a bunch of redundant information which they can easily see 
for themselves.

> I guess this is a long-winded way of saying, "me too".
> 
> Skip

ditto.

-- Talin




More information about the Python-Dev mailing list