[Doc-SIG] DPS components

David Goodger goodger@users.sourceforge.net
Thu, 20 Sep 2001 00:12:53 -0400


I forgot all about the fifth component: "filers" (formerly "output
management"). Filers exist for each method of storing the results of
processing:

- In a single file on disk.
- In a tree of directories and files on disk.
- In a single tree-shaped data structure in memory.
- In a tree of data structures in memory. (Maybe.)

As opposed to readers, parsers, designers, and writers, I see only a
small number of filers; namely, those listed above.

[Tony]
> Of course, the "reader" for plain .rst files is *ever so* simple!

It is the simplest reader, but it still has to do some work, like
hyperlink resolution & footnote numbering. On my to-do-soon list.

[Tony re "designer"]
> Actually, I think this is the interesting stage to have identified.
> I agree it's difficult to think of a name for it.

Rummaging through my dusty brain, I've come up with several
alternatives to "designer" and "transformer": collator, integrator,
interpreter, synthesist. I think "synthesist" is most apt. However,
the point may be moot; because...

Having just scribbled a diagram (ascii art below), I think the
"synthesist" (transformer/designer) is so tightly coupled to the
reader that it becomes an internal implementation detail::

           +--------+                +-------+
           | reader | -------------> | filer |
           +--------+                +-------+
             /   \\                      |
            /     \\                     |
           /       \\                    |
    +--------+    +------------+     +--------+
    | parser |    | synthesist |     | writer |
    +--------+    +------------+     +--------+

(Double lines denote tight coupling, single are loose.)

I can see a variety of synthesists for Python source readers, but
other input types (.rtxt file, PEP, email, etc.) won't need them.

[Tony]
> I assume that you are thus hinting that the output of the designer
> should be "pure" DPS nodes - that is, using only DPS tree nodes that
> are defined in dps/nodes.py, so that *any* DPS writer can be slotted
> in.

Yes, exactly. A generic document is produced and handed over to the
filer & writer.

> At the moment, my code doesn't work like that - the designer is
> producing "extended" nodes, and the writer understands what to do
> with them.

If we follow the diagram above, this goes away. The "pure" document
tree is used between reader, parser, filer, and writer; but the
synthesist is local to the reader and they can share any private
structure they like. Of course, it would be useful for that
structure to be comprehensive and well-documented.

> The example I would use to think about the flow through the system
> would be that of a simple table in the quick reference, which could
> use a directive::
>
>     .. quickreftable:: Directives
>        :link: http://link-to-text
>        ::
>
>          For instance:
>
>            .. graphic:: images/ball1.gif

I would rewrite that as::

    .. quickreftable:: Directives (http://link-to-text)

       For instance:

       .. image:: images/ball1.gif

Decide on a structure for the ``quickreftable`` directive. It can do
with its contents what it likes, including duplicating, parsing,
whatever. Be creative! Check out the directives I've built for
admonitions and images (image [not graphic] and figure).

> Now, given I want the table header to be in pale blue, with the word
> "Directives" in strong italics, and I want the table body to be
> split 50/50 between the two columns, with a pale yellow background,
> *if* I'm outputting to HTML - how do I do that? Bearing in mind that
> if I'm outputting to PDF, I want an entirely different set of
> details.

Style sheets would be useful for that. HTML has them, and a PDF
generator might too. Or writers might have their own collections of
style modules.

> My *suspicion* is that we have three sets of plugins for a directive

Whoa -- too complex. Remember, directives are a parser construct.
They're used to get around the limited syntax. But what comes out of
the parser should have proper structure, not just 'directives'. The
duplication and re-parsing should be done by the directive code being
run by the original parser. If anything else needs to be done, it
should be triggered by the specific element(s) produced by the parser.

> Despite the witterings above, I'm not too concerned about this as
> yet - I have the feeling that it will all come clear in the attempt
> to implement a "clean" system[1]_.

Agreed. But it does help to bash ideas around.

[Garth]
> Is there any problem I've missed that prevents us from spitting out
> plain XML and using transforms to convert it to XHTML? :)

[Tony]
> I would certainly imagine that would be possible, assuming (as I do)
> that all of the DPS node tree information gets dumped as XML
> elements/attributes.

Correct assumption

Nothing prevents XML->XHTML as Garth describes. Simply use the XML
writer and XSLT style sheets for the transformations. (Remi Bertholet
sent me .xsl and .css files; I'll make them available soon). The
problem with that approach is that you need software that understands
the style sheets. Certain versions of certain browsers do, but that's
not good enough for the general case. If there was an XSLT module in
the standard library, we could use it. Until then, we have to be able
to produce real HTML.

[Garth]
> Transformers take a DPS tree and spit out another DPS tree, right?

Correct, modulus the discussion above.

[Garth]
> Is the intent something along the lines of the following? ::
>
>     Writer.write(Transformer.transform(Parser.parse(Reader())))

Perhaps more like::

    Filer.file(Reader.read(inputref, Parser, Synthesist), Writer)

IOW we pass a parser class (or instance) in to the reader because the
parser might be called repeatedly for each doclet (actually, the
reader might auto-detect the markup format & load the parser itself).
The presence of a Synthesist class/instance would depend on the
reader. Same for filers: we pass the writer class/instance/ in since
it may be used for multiple document fragments.

[Tony]
> Of course, the other reason one might want a transformer is to amend
> the tree in some manner - for instance, it seems to me that the
> transformer is what would sort out intra-document references...

Good point. Something to consider when we actually tackle such beasts.

-- 
David Goodger    goodger@users.sourceforge.net    Open-source projects:
 - Python Docstring Processing System: http://docstring.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net
 - The Go Tools Project: http://gotools.sourceforge.net