[Doc-SIG] Structured Text
Edward D. Loper
edloper@gradient.cis.upenn.edu
Mon, 05 Mar 2001 20:46:41 EST
I've been going over the definitions of structured text (and its
various flavors), trying to see if I can formalize it even more than
Tibs did (http://homepage.ntlworld.com/tibsnjoan/STNG-format.html and
http://homepage.ntlworld.com/tibsnjoan/docutils/STpy.html)... And a
number of questions came up. I'm not sure if this is the correct
forum for such questions.. If not, I apologize, and would appreciate
it if you can tell me who I should be asking. Anyway, my questions
were:
1. Does every string value have an interpretation as a Structured
Text? That seems to be the case. If so, is that a Good Thing?
As an example of a string that we might not want to give a value,
consider:
|| indent level 0
||
|| indent level 1
||
|| indent level 2
||
|| indent level ??
I'd really prefer not to have cases like this have "undefined
semantics." It seems like we either need to specify what they
mean, or say that they're illegal.
2. If it is true that every string value has an interpretation as a
Structued Text, does it make sense to officially "discourage"
certain types of strings, such as the example listed above? It
might also make sense to discourage strings like:
|| this
|| is
|| one messed up
|| paragraph
3. Which types of "code coloring" (emph, inline, etc.) can "wrap" over
lines, and which can't? E.g., can I have an *emph statement that
continues to the next line?*
4. Is there any official precedance ordering on the different types of
"code coloring?" Will there be anytime soon? Any rules about what
types of code coloring can be contained in what other types?
5. Does structural formatting or code coloring take precedance? For
example, if a paragraph starts with "* foo *," will it be a normal
paragraph with an emphasized first element, or a list item? (It'll
be much easier for me to write formal rules if structure takes
precedence. ;) )
6. Among the list types, which take precedence? For example, if a
paragraph starts with "1. foo -- bar", is it an ordered list item
or a descriptive list item?
7. What is meant by saying that SGML text passes through? SGML isn't
even a mark-up language, so I assume that the intent is something
like "XML and HTML text passes through." But does that mean that
in an expression like '<TAG>a*b*</TAG>', the '*'s will be ignored?
That seems unreasonably difficult to implement. What about an
expression like '<T a="*x*"/>'? Does this mean I can't say things
like if 'x<y *and* y>z'? Is there strong support for the
notion of letting "SGML" text pass through, or is it something that
might be dropped? (I would certainly vote for dropping it. :) )
My eventual goal, to the extend that it's possible, is to write out a
complete formal specification for StructuredText using something
similar to BNF (Backus Naur Form). (I'm pretty sure that vanilla BNF
is not powerful enough to capture StructuredText.) After I've done
that, I'll start working on getting Emacs to colorize StructuredText
strings. I'd also like to create a sort of test-suite set of strings
to test how different implementations function on different
"ambiguously defined" cases..
Any help and/or pointers are very much appreciated. :)
-Edward