[Doc-SIG] ST and DOM
Edward D. Loper
edloper@gradient.cis.upenn.edu
Fri, 23 Mar 2001 08:38:59 EST
So I was just looking through the XHTML DTD, and it doesn't really
seem like what we want. But Tib's points about the DTD representation
being important as a well-defined interface to ST are well-taken..
Thus, I'd like to hash out some of the involved issues so I can
put the appropriate stuff in my PEP. :)
For now, I want to *only* consider global formatting. We'll get to
local formatting (=colorising) later. :)
There are 2 basic types of global formatting element: basic
elements (which are atomic, as far as global formatting goes);
and hierarchical elements (which are not).
I really think that the DOM tree should capture the *structure* of
the formatted string.. To me, that means that it's weird to have
elements like define a list item to be "a text block that *starts*
a list item"... Anyway, I propose that we use something similar to
the following scheme:
Basic units::
<!ELEMENT paragraph ...>
<!ELEMENT bullet ...>
<!ELEMENT literalblock ...>
<!ELEMENT doctestblock ...>
<!ELEMENT label ...>
<!ELEMENT anchor ...>
Hierarchical units::
<!ELEMENT structuredtext ((section | paragraph | list |
literalblock | doctestblock |
labelsection)*,
anchorsection*)>
<!ELEMENT section (heading,
(section | paragraph | list |
literalblock | doctestblock)+)>
<!ELEMENT list (listitem+)>
<!ELEMENT listitem (bullet,
(paragraph | list |
literalblock | doctestblock)*)>
<!ELEMENT anchorsection (anchor,
(paragraph | list |
literalblock | doctestblock)*)>
<!ELEMENT labelsection (label,
(section | paragraph | list
literalblock | doctestblock)+)>
Some notes on this scheme.. Some of these might end up getting
changed..
* labelsection can only appear at top-level
* anchorsection can only appear at top-level, and after all
other elements of structuredtext.
* list items may not contain sections; but they can contain
just about anything else (except top-level-only things).
* anchor sections may not contain sections; but they can
contain just about anything else (except top-level-only
things).
* labelsections can contain anything except top-level-only
things. However, particular labels may place further
restrictions on their contents..
Now, this is not meant to be a final DTD.. For example, it might
make sense to split list, listitem, and bullet into 3: dlist, olist,
ulist, etc.. But does this *overall* structure seem reasonable?
For comparison, Tibs has a DTD at the bottom of
<http://homepage.ntlworld.com/tibsnjoan/docutils/STpy.html>,
although I'm not sure if it's up-to-date. It seems to go against
some of the things he's been saying on doc-sig lately.. (??).
-Edward