[Doc-SIG] Documentation markup & processing

Mon, 04 Jun 2001 23:48:19 -0400

Mon, 04 Jun 2001 13:35:07 EDT
on 2001-06-04, Edward D. Loper (edloper@gradient.cis.upenn.edu) wrote:
> The one thing that worries me somewhat is that David threw a
> lot of stuff at the group all at once.

Yes. I was working on refining reStructuredText (RST for short) when the
idea of the Docstring Processing System (DPS) framework arose. RST was
pretty much done and DPS was far enough along to release for feedback and
(hopefully) contributions. RST depends on DPS, and RST is an example of a
DPS component. So it made sense to release them simultaneously. Apologies
for the overload!

> I think we should try to
> address one concern at a time, to the extent possible.

Agreed. I especially don't want the two projects to influence each other.

> In that spirit, I think David's idea of creating an overall framework,
> that divides the problem into formatters, processing tools, etc., is a
> very good idea.  And I think that we can informally "ratify" such a
> framework without a pep, to help guide the efforts of those active on
> the doc-sig.  Of course, we might then turn it into a pep, and get the
> official stamp of approval, etc.

The PEP process starts with a proposal, which is what I've written. Whether
or not it gets approved is up to us; first the PEP has to be completed, and
a reference implementation too. But the existence of the PEP gives us a
focus, a baseline document to work from and on, to approve or reject.

> - What pieces should we split the problem into?  The most obvious
> pieces are parsers and outputters.  Are there subproblems that can
> be well-defined (we would need to be able to define precise
> interfaces).

Ones that come to mind are the docstring extraction machinery (especially if
additional & attribute docstrings are okayed), and the output management
(the file/directory structure, and the output data structure when called
for). I'm sure there will be more.

> - What should be/needs to be specified by the DPS beyond the
> interfaces?  For example, it looked like David's PEPs specified
> that the DPS should never parse private member docstrings.  But
> this might be very useful to do sometimes.

Let's make it an option. The default would be to respect __all__ and private
members, with a "give me everything" option.

> Put a different way, where do we want to draw the
> line between "API issues" and "tool issues"?  I would argue that
> *what* gets documented is a tool issue.

Agreed.

> - What's the best way to encode the APIs?  My first instincts were
> to use XML and DOM. ... But I'm not sure that that's the best way to
> go.  The reason I say that is because I've implemented by doc
> system using DOM for some intermediate representations, and it can
> be very inconvenient.

I'd be interested to hear why.

The DOM is what I've put down for an internal intermediate data structure,
part of the API. DOM is there, doesn't need to be written. It suits the task
at hand: a tree-hierarchy of objects with attributes and text contents. Why
reinvent the wheel?

> I think that the only reason to use DOM is
> if we expect the interfaces to change, either during our
> discussions, or down the road.

I think we can depend on change.

> - It seems like we should be paying a lot of attention to the two
> DTDs that David has on sf, because those will place strong
> constraints on what parsers *can* do, and on what outputers *have*
> to handle.

One idea not in the PEPs yet is how formatters should handle unimplemented
elements. Given a reasonable solution (I don't know what it is yet),
formatters could implement a subset of the elements, and gradually grow to
completion.

> I think that both DTDs have to be very well documented
> before we can accept them.  At the very least, we need
> definitions of the semantics of each element.  I'm not sure that
> everyone would be happy with the DTDs in their current state.

I agree completely, and I intend to do more documentation of the DTDs. I
invite and welcome questions & criticism.

> I think that optimally, the sig should address the following issues,
> roughly in order:

The DPS Generic Implementation Details PEP deals with #1 and #2. It's
nowhere near complete yet; I just wanted to get it out there.

> 1. Define exactly what the DPS does. ...
> 
> 2. Define the interfaces, one at a time.  Currently, there's really
> just one interface: between parsers and outputters.

I see several interfaces, actually. The parser->DPS interface, the
DPS->formatter interface, the internal intermediate data structure (shared
by parser, DPS, and formatter), the system Python interface (for when using
the DPS as a package from other code), and the command-line interface.

> 2a. Agree, at least on the sig, that we like the DPS, and we intend
> to work within its framework.

:-)

> p.s., David, I'll send you URLs by the end of the week, so you can
> include some of my work in your peps. :)

Yes, please; and I will. Thanks for your preliminary comments. I look
forward to more once you've had time to go through it all.

-- 
David Goodger    dgoodger@bigfoot.com    Open-source projects:
 - Python Docstring Processing System: http://docstring.sf.net
 - reStructuredText: http://structuredtext.sf.net
 - The Go Tools Project: http://gotools.sf.net