[Doc-SIG] which characters to use for docstring markup

Sun, 08 Apr 2001 16:08:18 -0700

>At 03:40 PM 4/8/01 EDT, Edward D. Loper wrote:
>> At 04:06 PM 4/6/01 -0500, Guido van Rossum didn't write :-)
>> At 02:21 PM 4/7/01 -0700, I wrote:
>> FYI, I have the HappyDoc formatter/docstring extractor
>> (happydoc.sourceforge.net),
>> generating standard Python documentation LaTeX from docstrings. It's kind
>> of nice, because it means that I immediately have all of the python.sty
>> features available
>> to me for crosseferencing etc., plus it immediately gives me my docstring
>> derived documents in PDF, PS, HTML, and info (if I can get info working
>> again).
>
>The only formatters I could find for HappyDoc use StructuredTextClassic,
>or some variant.  And many people (incl. Guido) are not terribly happy
>with ST.  Does the formatter you're talking about use something else?
>What does it do about lists, etc?

The formatter is an extension that I've added to HappyDoc. I'm working with
the author to get the changes back into the distribution; with luck they
may be done RSN (days). I hope they will be adopted into the next version
(the changes are small, and it really just introduces a new hdformatter).

>> def foo():
>> 	r"""
>> 	My \code{foo} function \emph{breaks} the
>> 	\module{bar} module.\index{Foos and Bars}
>> 	"""
>> 	pass
>
>Most people have objected to "heavyweight" markup for docstrings.. 
>i.e., they don't want to have to write docstrings in LaTeX or XML
>or whatever..  It *looks* like you're basically just writing 
>docstrings using some subset of LaTeX? 

Yes it's a subset - I should have made that clear. There is a subset of
LaTeX implicitly defined by the Python \file{Doc/} tools, by virtue of
the constraint that the output be generatable in HTML and info as well.

It's really the subset of LaTeX that is equivalent to TeXinfo,
(give or take some minor naming differences). For the sake of discussion,
let me call this LaTeXinfo. The subset contains all of what you need for
docstring highlighting etc., plus, and in my eyes a big plus,
everything you need for cross-referencing, TOC and indexing of a group of
modules. For the sake of discussion, we'll say it contains nothing else.

Heavyweight is a relative term of course, and I think most users of TeXinfo
feel it's not too heavy. It's a fair balance between light and complete.

> If so, we'd have to carefully
>define *which* subset, and what everything means, etc., before I
>would accept it.  We don't want people assuming that, just because
>they can use \emph{...}, they can use all their other favorite LaTeX
>commands (we do, after all, want it to be possible to convert this
>to HTML, info pages, etc.)

Agreed. The subset is well defined and documented already, and in
widespread use as the current documentation standard for Python.
The current installed base equals the installed base of Python.

>> 1)	It's quite complete for all of the entended uses (\emph(...})
>> 	Because it's more or less TexInfo compatible, most people know it
>> 	or can learn it easily, even if you don't know LaTeX.
>
>But you can't make it *too* "complete," or it won't be a standard that
>people can write tools to process anymore..  We don't want to just
>reimplement LaTeX here...

You're right. I find the subset to be complete enough, especially for
docstrings, and the tools are already written. It has to be small for info.

>> 2)	It means you can cut and paste between docstrings and the formal
>> 	module documentation for Doc/.
>> 3)	The macros/commands are already completely documented, and the 
>> 	documentation for them ships with the core distribution.
>> 4)	It would reinforce the use of the Doc/ tools.
>
>I actually am not too familiar with the Doc/ tools..  Can you give
>me a pointer to them?  

They are with every Python distribution, or take a look at 
\citetitle[http://www.python.org/doc/current/doc/doc.html]{Documenting Python}

In my view, the documentation is one of Python's strengths, and the
benefits of having standardized the documentation early are huge.
But documentation is always a painful task, and I think there are real
benefits to a documentation approach that is scalable, from docstrings
all the way up to the reference documentation.

>But copy/paste does seem useful.  (Although it 
>should at least be possible to write conversion tools, in any case, 
>given a good standard).

It's \emph{really} nice to have the \key{PASTE} key as a conversion
tool. I find myself documenting modules a lot, classes a little, etc.
and by then, a first draft of the reference documentation is already done.

>> 5)	It would reinforce the use of HappyDoc (semi-literate programmming).
>
>Happydoc seems like a nice tool.  Whatever markup language we settle
>on (if we ever do), a HappyDoc formatter will probably be implemented..

I only wrote the LaTeXinfo extention to HappyDoc last week, and already
I'm very Happy \grin.  But the LaTeXinfo version is by far the most advanced: 
having my entire module and class structure documented with indexing, Table
of Contents and cross-references, in HTML, info and PDF is huge.*

>> 7)	It's likely to be mainly backward compatible - I doubt many
>> 	docstrings use \ a lot. On the other hand, I bet a lot of
>> 	them use blank line as a paragraph seperator.
>
>I believe that trying to be "backward compatible" with a markup language
>is an extremely dangerous thing to do, esp. if your markup language is
>relatively "forgiving," because you probably won't *notice* the places
>where it gets confused.  I would rather be explicitly non-backward-
>compatible.

Sorry, what I meant was backward compatible with all the current docstrings
in the existing Python library. It is backward compatible in the sense:

\begin{enumerate}
\item	There are very few occurences of \textbackslash.
\item	A blank line implies a paragraph.
\end{enumerate}

This would not be true if we went to an HTML markup system for example: all
current docstrings in existing code would require the insersion of
\code{<P>} for the blank lines, plus worrying about the more frequently used
\samp{<} and \samp{>} characters.  Small details, but nice.

The whole docstring implementation in this way could be done simply by:
\begin{enumerate}

\item	Define a subset of LaTeXinfo commands that would be admissible in
docstrings.  You could start very small, and add to them in time.

\item	Change the page where these command are documented in the Python
documentation to seperate out the docstring subset on their own page, and
tell people about the \code{r"""markup"""} trick (see below).

\item	Implement the tty parser for docstrings so that they look pretty
at the terminal. For this, it it important to note that there are existing
reference implementations of a tty representation (info in C, info in Emacs),
so presumably you could blindly copy the info representations. That way
you would be compatible with the primordial Python IDE: Emacs.

\end{enumerate}

Note also, that a lot of the mileage of this approach is gained from
the fortutious coincidence that r""" docstring with \LaTeX\ markup""" works
for docstrings too, which makes it easy to use backslashes:

def foo():
	r"""
	My \code{foo} function \emph{breaks} the
	\module{bar} module.\index{Foos and Bars}
	"""
 	pass

Mike.

* \footnote{If people want, I can put a copy of a HappyDoc LaTeXinfo generated
PDF file up on starship for people to browse.}

PS: My apologies if anyone was mislead by the apparent misattribution of my
previous post; it was Edward D. Loper I was quoting.