[Doc-SIG] formalizing StructuredText
Edward D. Loper
Thu, 15 Mar 2001 16:37:46 EST
I've been working on expanding the domain of STminus (a formalized
version of StructuredText, expressed in an EBNF variant).. And
the following questions came up. (Some of them may not make much
sense if you're not familiar with StructuredText.) These are
generally not questions that have "correct" answers, so I'm
wondering what people think I should make STminus do. (Of course
I'm interested in what STpy and STNG have to say about these
* Are list items required to have contents? I.e., can a list
item be just a bullet? This only makes sense to me if you
used it in an environment like::
* Apostrophes can appear in the middle of a word or at the end
of a word, like "isn't" and "dogs'". Is it illegal to have
multiple apostrophes in the same word? There are no English
words that use multiple apostrophes, but I'm not sure about
other languages (although there are probably some languages
that have words with apostrophes at the beginning of a word,
("'til"?) and StructuredText clearly won't deal with those..)
* When parsing various structures, like paragraphs and list
items and bold items, what whitespace is kept? E.g., if I
were to export to XML, would the trailing whitespace on
paragraphs be included? Or the whitespace between a
description list key and the hyphen?
* Can #inline# expressions contain newlines? I assume not
('literal' expressions can't.)
* What are valid expressions for starting an ordered list item?
Currently STNG uses "([a-zA-Z]+\.)|([0-9]+\.)|([0-9]+\s+)"
i.e., a series of letters followed by a dot, a series of
numbers followed by a dot, or a number followed by space.
This seems wrong to me, because it implies that the following
are ordered list items::
Hi. This is a list item.
12 is a fun number.
And it does not allow for expressions like:
1.2. This is a list item.
Also, note that since in STpy variants (which will include
my proposed markup for formatted docstrings), list items can
begin without an intervening space.. So we would get::
The first line is a paragraph but the second line is a list
item. (Since it starts with letters followed by a dot)
Even if we restrict ourselves to Roman numerals, we have
Hopefully someone who can figure this out who is smarter than
I. But I don't see a way to use roman numerals safely..
So maybe we could just use "([0-9]+\.)+"?
* What restrictions are there on hfrefs ("name"://http:some.url)
According to STNG, they can use relative URLs ("name":whatever).
These end up being pretty tricky to formalize..
* Can href names span multiple lines?
* Can href names contain coloring? (I'd like to say no)
* Should the string '":' only be allowed for hrefs?
Or maybe '":(?!\s)', so you can say "this": that?
* What do you do with things like::
This *is "too* confusing":http://some.url
(Keeping in mind that things like this should be ok)::
Normally *quotes " don't have* any special meaning,"
so they don't have to nest properly..
Well, that's all for now. I'll post more issues as they come up. :)