[Python-Dev] pydoc improvements

Noam Raphael noamr at myrealbox.com
Sat Jun 12 20:48:11 EDT 2004


I've made a few improvements to the pydoc module. I've tried to deliver 
them to Ka-Ping Yee, but it seems that the e-mail wasn't transferred, so 
here I present them to python-dev, including Ping. I hope they will be 
reviewed and, if found to be appropriate, accepted.

Pretty-Printed Source Code
   Many times, in order to understand exactly what a function does, I'm 
interested in its source code. This is especially true for Python, 
because of its readability. So I made pydoc produce, along with the HTML 
documentation of a module, a highlighted source code of the module, 
interlinked with its documentation.
   I added to the HTMLDoc class a new method, sourcemodule (you may think
of a better name). It gets a filename and a few more parameters, and
returns the pretty-printed source code, in HTML format. Class and method
definitions are links to the description of them in the generated
module documentation. I also made class and function definitions in the
module documentation link to their definition in the source code.
   I've changed the second link on the right-top of a module
documentation to point to the HTML source instead of the original file,
because I think usually people are interested in seeing the code in
their browser, not in downloading it. In case they are interesting in
downloading it, I added a link called "download <filename>" to the HTML 
source, so people can click it and hopefully get the pure source code.
   The file name of the HTML source code is "<module name>-source.html".
It is now generated automatically along with the "<module name>.html",
and is served by the HTTP server.

Public and Private
   Public attributes are defined by the current pydoc version as 
attributes whose name doesn't start with an underscore, or, for module 
attributes where __all__ is defined, attributes which appear in __all__.
   Currently pydoc doesn't show private objects at all. This is usually
what people want, but sometimes, when they are interested in what the 
module really does (for example when they want to define a subclass), 
they may want to view the documentation for private objects too. So I 
added a link to the module documentation header, which alternates 
between "Show Private" and "Hide Private", and shows or hides private 
attributes - both module attributes and class methods. To accomplish 
this, I defined two alternative style sheets: one which shows private 
attributes and one which doesn't. The "Show/Hide Private" link uses 
javascript to switch between these two styles. (The style can also be 
selected by a menu in the Mozilla browser.) I used a cookie to save the 
user's prefered style - if I hadn't, the style would've returned to the 
default every time a new documentation page is opened. The produces HTML 
is standard-compliant, to my knowledge, and works with both Mozilla and 
IE. The style-switching link doesn't work in konqueror, so one using it 
won't be able to see private attributes - not too bad.
   Many modules contain objects imported from other modules. Usually
these are for the module's own use, and the module's user isn't 
interested in them. These are not even private - I don't think anyone 
will want to see their documentation. In the current version of pydoc, 
these are filtered by calling inspect.getmodule. This raises two 
problems: the first is that inspect.getmodule can't deduce the module of 
built-in objects, so the module documentation may be filled with such 
"garbage" (it actually happened to me, before I've used my version of 
pydoc). The second problem is that sometimes a module wants attributes 
of other modules to be considered as its own attributes too (for 
example, the tokenize module exports the constants of the token module). 
My solution is this: attributes which are not public attributes of a
module, and are public attributes of another imported module, are never
displayed. They are probably the result of an internal "from module 
import something" statement.
   The patch attached is against version 1.91, because I implemented 
using __all__ in a bit different way than version 1.92.

More flexible HTML documentation of a complete directory
   Here's my actual situation: There's a package maintainance system, 
where you first create files in your home directory and then use the 
system to install them in a common place. For example, if a package is 
called foo, and includes the modules ya.py and chsia.py, the modules 
will be named /home/noam/foo/lib/python/{ya,chsia}.py. I want to 
automatically document them, so that the HTML files will sit in 
/home/noam/foo/doc. After installing the package, the python modules 
will be copied to /usr/local/lib/python/, and the documentation will be 
copied to /usr/local/doc/foo.
   I want to be able to type one command to generate all the
documentation, but I don't want to waste time if I only need the 
documentation re-generated for one module. So I made "pydoc -w <dir>" 
produce documentation only if the target file was modified before the 
module was. (pydoc -w <filename> still write the documentation even if 
it seems up to date.)
   It is perfectly legitimate to write a HTML documentation for a module
by your own. So I made pydoc never overwrite HTML files which weren't
created by pydoc. To accomplish this, I added the tag <meta
name="generator" content="pydoc"> to the head of all HTML files produced
by pydoc. A warning is printed if a module was modified after a manually 
created documentation was modified.
   I wanted to be able to run "pydoc -w" from any directory, not just
from /home/noam/foo/doc, so I added an option "-o <dir>", to specify the
output directory. (If it isn't specified, the current directory is used.)
   I wanted the documentation to refer to the installed file 
(/usr/local/lib/python/foo.py), not to the file in my home directory. So 
I added an option "-d <destdir>", to specify the directory in which the 
files will reside after being installed.
   To summarise, the command which I should now run to produce the
documentation is
pydoc -w -o doc/ -d /usr/local/ lib/python/
(running it from /home/noam/foo).
   The -d and -o options are already available in compileall.py, for 
exactly the same reasons, I think.
   None of this should cause backward compatibility issues.

Module loading
   Currently, safeimport(), used by writedocs(), gets only a module name
and imports it using __import__(). This means that when you document a
complete directory, you can only produce HTML documentation for modules
which are already installed, and that a documentation for the wrong 
version of a module may be produced. I changed safeimport() to get an 
optional list of directories in which to look for the module/package, in 
which case it uses imp.find_module and imp.load_module instead of 
__import__. This means that when producing documentation for a complete 
directory, the actual files in the directory are used, not installed 
modules of the same name.

   I changed the docmodule methods, to use cStringIO.StringIO.write 
instead of s = s + something.
   in do_GET, forceload was set to True for some reason. It caused
problems when documenting modules which pydoc itself uses, and I don't
see why should modules be forced to load, so I changed it to False.
   The link to a file was "file:"+path. I changed it to "file://"+path,
because my browser (Mozilla) treated the path as a relative path.

Attached is a diff against version 1.91 of pydoc.py (see the section on 
public and private).
I will be glad if after your review, at least some of these changes will 
be accepted.

Have a happy day,
Noam Raphael
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pydoc.py.diff
Type: text/x-patch
Size: 58687 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20040613/8e42d480/pydoc.py-0001.bin

More information about the Python-Dev mailing list