[Doc-SIG] ST and DOM

Edward D. Loper edloper@gradient.cis.upenn.edu
Fri, 23 Mar 2001 08:38:59 EST


So I was just looking through the XHTML DTD, and it doesn't really
seem like what we want.  But Tib's points about the DTD representation
being important as a well-defined interface to ST are well-taken..
Thus, I'd like to hash out some of the involved issues so I can
put the appropriate stuff in my PEP. :)

For now, I want to *only* consider global formatting.  We'll get to
local formatting (=colorising) later. :)

There are 2 basic types of global formatting element: basic 
elements (which are atomic, as far as global formatting goes); 
and hierarchical elements (which are not).

I really think that the DOM tree should capture the *structure* of
the formatted string..  To me, that means that it's weird to have
elements like define a list item to be "a text block that *starts*
a list item"...  Anyway, I propose that we use something similar to
the following scheme:

Basic units::

    <!ELEMENT paragraph ...>
    <!ELEMENT bullet ...>
    <!ELEMENT literalblock ...>
    <!ELEMENT doctestblock ...>
    <!ELEMENT label ...>
    <!ELEMENT anchor ...>

Hierarchical units::

    <!ELEMENT structuredtext ((section | paragraph | list |
                               literalblock | doctestblock | 
                               labelsection)*, 
                              anchorsection*)>
    <!ELEMENT section (heading, 
                       (section | paragraph | list |
                        literalblock | doctestblock)+)>
    <!ELEMENT list (listitem+)>
    <!ELEMENT listitem (bullet, 
                        (paragraph | list |
                         literalblock | doctestblock)*)>
    <!ELEMENT anchorsection (anchor, 
                             (paragraph | list |
                              literalblock | doctestblock)*)>
    <!ELEMENT labelsection (label, 
                            (section | paragraph | list
                             literalblock | doctestblock)+)>

Some notes on this scheme..  Some of these might end up getting
changed..
  * labelsection can only appear at top-level
  * anchorsection can only appear at top-level, and after all
    other elements of structuredtext.
  * list items may not contain sections; but they can contain
    just about anything else (except top-level-only things).
  * anchor sections may not contain sections; but they can
    contain just about anything else (except top-level-only
    things).
  * labelsections can contain anything except top-level-only
    things.  However, particular labels may place further
    restrictions on their contents..

Now, this is not meant to be a final DTD..  For example, it might
make sense to split list, listitem, and bullet into 3: dlist, olist,
ulist, etc..  But does this *overall* structure seem reasonable?

For comparison, Tibs has a DTD at the bottom of
<http://homepage.ntlworld.com/tibsnjoan/docutils/STpy.html>,
although I'm not sure if it's up-to-date.  It seems to go against
some of the things he's been saying on doc-sig lately.. (??).

-Edward