[Doc-SIG] Some random thoughts

Laurence Tratt laurie@eh.org
Sun, 05 Mar 2000 15:48:54 GMT

In message <m12RbXY-000CnDC@artcom0.artcom-gmbh.de>
          pf@artcom-gmbh.de (Peter Funk) wrote:

Thanks for the comments. OK, here goes...

> First I wonder, why you decided  to use John Aycocks Spark as parser
> instead of using the builtin parser provided by the standard library.

I don't. Well I do. But to parse Python files, I use the builtin parser. The
code's in lib/crystal/Languages/Python/Parsers/CPython/ (snappy eh?).

SPARK is used to compile the internal LayoutLanguage, which is it's own
little thing.

If it used SPARK to compile Python files you would know about it at
execution time! On my P233, I reckon SPARK munches somewhere just under 1Kb
a second: fine for the Layout Language (which has relatively little input),
but too slow for big Python files. On his website, John mentions the next
version will be faster which would be great. Even with the speed issues,
SPARK is truly great, and so to have speed less of an issue will be cool.

> The structure of your program was somewhat hard to grasp for me

Yes, without documentation it looks like a mess. In reality, there is a
method behind the madness, and all those packages are intended so that
future expansion is easy. For example, I decided to use '_' to indicate a
private or protected (in the Java sense) 'thing', be that an object, module,
class or whatever. A lot of those decisions just aren't apparent in a
straight source release and for that I apologise.

Now that I have written the appropriate tool, I'm working on putting doc
comments into Crystal itself. This should aid things a bit. The package
hierarchy isn't something that's going to (or should) go away, and so that
particular learning curve will remain.

> As far as I can see, the interesting functionality is contained 
> (or may I say somewhat hidden?) in the following two modules:
>    crystal/Defaults/Python_To_HTML3.py
> and
>    crystal/Doc_Parsers/StructuredText/Facade.py

I think I'd go with that, although I would add the file I mentioned earlier
and crystal/Languages/Python/Formatters/Default/_Format.py to that list as
well. There's a lot of other important code strewn around, but those 4 files
are probably the main workhorses.

crystal/Outputters/_Layout_Language.py is pretty important too.

> This was not easy to figure out.

The good things in life never are :)

> As far as I understood your code, the "outputter" has the task to
> transform an intermediate structure tree and produce for example HTML
> using the definitions provided by a dictionary like that in
> Python_To_HTML3.py, right?

The basic stages of Crystal are:

  * munch files using a Languages parser
      * at the same time munch doc comments using a Doc_Parser
  * create a 'logical' page using a Languages Formatter
  * convert the logical page into a physical page using an Ouputter and
      the Layout language to output to disk / screen / whatever

Important concepts are:

  * Language
    eg Python, Java
    A Language contains *everything* specific to a language

  * Doc_Parser
    eg javadoc, StructuredText
    This munches doc comments in a specific style into an internal format

  * Formatter
    eg 'Default', UML
    These live within a Language (they're language specific) and create a
      logically formmatted page
    A language doesn't actually have to have a seperate Formatter, but it
      would have to have some code which did the same job

  * Outputter
    eg HTML, Text, man
    Utilises the Layout Language to convert a logical page into physical

> So why do you define your own special purpose language for such
> definitions, when html3_outputter__styles could have been a class
> containing methods for each structure element?  May be I should have spend
> more time on studying your code (which I didn't had).

I am not a fan of Yet Another Little Language, so I didn't take the decision
lightly. Basically, you *could* definitely do everything the Layout Language
does in pure Python but by taking control (or at least the illusion of
control) away from the programmer, I make simple things very easy to write
(that 13 or 14Kb of stuff in Python_To_HTML3.py would IMHO be much larger -
and convuluted - in standard Python) and do some useful little things on
their behalf. Again, undocumented :(

Initially I did do things in pure Python, but I found it forced me to write
un-Pythonic Python thus satisfying noone. The Layout Language also includes
a few things like type checking (well, the information is in there, even if
the code that does the checking isn't) to make things a bit more strict. I'd
class it as an experimental feature rather than a definite: I'm certainly
open to suggestions.

There is also the fact that I have got a reference manual for the Layout
Language in about 5 sides of A4: for people who don't know Python (the idea
behind the system is that it isn't Python specific), the chances are this
tiny little language (the grammar is about 15 lines with one-rule-per-line)
will be easier to learn than Python. Of course one could say they should
learn Python, but that's another issue <wink>.

[...on the StructuredText module]
>> * It seems to be geared up for subclasses returning strings (I needed a
>>     recursive data structure, not a string representation)
>>   In fact, realistically, the implementation is set up with only HTML in
>>     mind
>> * There's no real documentation for the implementation
>> * The implementation is *very* hard to understand if you haven't watched
>>     it evolve
> Well.  I was able to do my own small modifications on StructuredText.py
> but I agree to some degree with you.  Jim Fultons approach is too heavily
> based in string processing.

I think that's fair enough. I feel fairly certain in saying Jim had HTML as
his target market in mind, and that it fits that bill perfectly. Also, I
doubt he had the eventual continued expansion in mind when he started it in
about '95 / '96!

>>   * The example code protocol is crufty & non-overridable. What should:
>>     """
>>     ...so for example:
>>        * element 1
>>        * element 2
>>     """
>>     do? C++ programs might get caught out with the actions of '::' in
>>       StructuredText <wink>
> Personally I would prefer the idea of Tim Peters doctest for examples.
> He suggests to use the Interpreter prompt as markup for example code:
>       >>> a = FooBar("Baz")
>       >>> a.something_completely_different()
>       >>> a.spam()

I could live with that, definitely: most Pythonic. But only *provided*
the >>> doesn't necessarily come out in the output (unless you switched that
on... Er, that might be tricky to do sensibly). Any ideas?

>> * Forcing anything between ' ' into <Code> seems particularly clumsy; *
>>     has a good history of being an emphasis effect and ** is a cunning
>>     extension to that, but ' ' seems unnatural
> Agreed.  This has been discussed here before.

Do we have any suggestions for an alternative?

>> * From a purely Python perspective, having _ _ as the underline protocol
>>     tends to cause __init__ type method names to come out somewhat
>>     unexpectedly. But that's not StructuredTexts fault <wink>
> As has been said before, underlining is bad markup anyway.  It is not
> used in any reasonable typeset output.

I think I could live with that.

>> * Should styles nest? So is *this **going** to work* ?
> I would say *not*.

My thinking is that perhaps there need to be two "types" of style. The * **
simple ones, and then something like perhaps <Emph> </Emph> ; the later ones
could be used in more complex situations and could nest.

> section headlines
> -----------------
> I think it would buy much, if a few additional features will be added to
> the structured text idea:  lists are fine, but another important doc
> structure element are section and subsection headers.  Often they are
> marked up as a line of text containing the headline followed by line of
> '-' with the same length.  Both lines may be indented.  The standard
> library modules 'cgi', 'pipes' and 'pprint' contain examples of this kind
> of markup.

I am currently doing some fence sitting on section headlines and sections
and subsections etc. I don't know how to strike the right balance between
good quality output and ease of use / readibility in the code in this

One thing that's surprised me is that I thought people were going to tear me
to shreds over the HTML output (in terms of both the quality and the actual
look in the browser) and ignore the source (which is still rough) whereas so
far it's been mostly the other way around. Shows you how wrong I can be :)

we're-going-somewhere-now-though-ly y'rs Laurie