[Doc-SIG] syntax vs semantics: implicit --> explicit
Goodger, David
dgoodger@atsautomation.com
Fri, 30 Mar 2001 13:57:03 -0500
I've found that many of our discussions about auto-documentation
generators unnecessarily (and confusingly) mix arguments from
different levels (syntax vs. semantics, and multi-layered semantics at
that). In an effort to further make implicit explicit, and to reduce
confusion & frustration, I think it's important to separate our
discussions based on individual components. At least, we should be
conscious of 'where we're coming from' and make that more explicit.
For example, I think it's counterproductive to talk about the syntax
of a particular construct (e.g. characters used to delimit literals)
in the same breath as talking about a Python-specific concept (e.g.
hyperlinks generated from the interpretation of literals in a
Python-specific context). If the syntax is right, the semantics should
fit. Of course, the syntax discussion is at least partially being
driven by semantics. I am proposing that we be more explicit about the
motivations behind our suggestions.
On to a definition of terms, using block diagrams (useful for a
blockhead like me :-):
The parser is the basic component which takes raw text as input and
produces a data structure as output::
+--------+
text --> | parser | --> parsed data structure
+--------+ (internal, e.g. DOM tree)
Depending on what we want to do with the data, we'll need output
formatters::
+-----------+
structured data --> | formatter | --> formatted data
(internal) +-----------+ (XML, HTML, TeX, info, etc.)
A simple converter program would just need to link the two::
+------------------------------+
| converter |
| +--------+ +-----------+ |
text --> | | parser | --> | formatter | | --> formatted data
| +--------+ +-----------+ |
+------------------------------+
Now, when we get into auto-doc-generators (like HappyDoc, Crystal,
pydoc, etc.), we need to add Python-specific knowledge to the mix::
+-------------------------------------------+
| Python Documentation Processor |
| |
| +---------------------------------------+ |
| | operating logic: | |
| | knowledge of Python syntax, docstring | |
| | conventions and rules | |
| +---------------------------------------+ |
| |
| +-----------------+ +------------+ |
| | Structured Text | | output | |
| | parser | | formatters | |
| +-----------------+ +------------+ |
| | Python-specific | |
| | extensions | +------------------+ |
| +-----------------+ | Python language | |
| +--------------+ | services | |
| | (potentially | | (parser.py, xml, | |
| | other input | | inspect, etc.) | |
| | parsers) | | | |
| +--------------+ +------------------+ |
+-------------------------------------------+
I don't know about others on this list, but I would like to use an
ST-like markup language for more than just Python docstrings. I'd like
to use it for documentation of all kinds, from how-to manuals to web
pages (to books even, for crazies like me). When discussing Python
docstrings, section hierarchy features (section titles) are less
important than for writing a magazine article. This forum, of course,
is specifically geared toward Python documentation. But am I
unreasonable in thinking that this markup scheme has broader
applications? See the Setext specification
(http://www.bsdi.com/setext/) for its history; basically, it was used
for a pre-web electronic newsletter, TidBits, whose texts were quite
long.
(Last year I wrote a chapter on Python for Wrox Press' "Professional
Linux Programming". I would have been much happier using a complete
ST-like markup than futzing around in MSWord.)
I believe that the operating logic/rules/conventions ought to be
separated conceptually and code-wise from the parser. The parser
itself should be separated into generic and Python-specific parts.
These things should not be tied together, at least not so strongly.
Opinions? Flames? I've got my asbestos suit on!
Thanks for reading my idle ramblings!
/DG