[Doc-SIG] formalizing StructuredText

Tony J Ibbs (Tibs) tony@lsl.co.uk
Mon, 19 Mar 2001 09:53:35 -0000

After a weekend where *some* work got done, significant points:

1. Newlines are preserved again in non-literal paragraphs (Edward Loper
convinced me that the benefits outweighed the problems).
2. Newlines are not allowed within literal and Python literal strings.
3. Local references (which look like '[this]' or '[1]') are now
supported. The "anchor" for a local reference must be at the start of a
paragraph (in future releases I would expect it to *start* a new
paragraph if at the start of a line), and looks like::


4. List items and local references may be "empty" paragraphs, but there
may still be some unresolved issues with respect to newlines - I'm not
sure that::

	  Some text

is allowed (it probably should be, if the form with a blank line between
those two lines *is* allowed).
5. The RE used for detecting URLs has become more sophisticated. There
are some associated rules - first, "odd" characters (which will be
listed in the documentation) must be escaped, either as '&entity;' or as
'%xx', and secondly, only a select group of characters may form the
*last* character of a URL - essentially, [0-9A-Za-z/], or something like
that - this means that "normal punctuation" cannot form the end of a URL
(I don't regard these as very common!), and thus 'http://www.fred.jim/.'
unambiguously ends a sentence with that full stop, it is not part of the
URL. This is a Good Thing.

The following are probably mostly in response to Edward Loper:

I said that with REs you didn't detect errors

> Well, it depends on how you're detecting errors...
> > 	plain: 'This '
> > 	emph:  'is "too'
> > 	plain: ' confusing":http://some.url'
> Here, you could say that the string '":' without a matching '"'
> is illegal, and raise an error..

That approach is what I meant when I talked about "a long RE for
detecting common errors", and it is a sensible approach *if one is
validating* - but the results should be warnings, 'cos one of the points
of ST, originally, is that users should be able to "push the corners" a

> But from the point of view of someone formalizing the language, saying
> "there's an ambiguity" is no good.  I have to either explicitly say
> "it's illegal" (=undefined) or "xyz is the correct answer."

Oh, I agree, and it's a good thing to do. But you *do* have a third
option, which is the "this behaviour produces undefined results", which
is not *quite* the same as "illegal".

> p.s., I'm not sure it's safe for us both to be writing email at the
> same time.  We might overload other peoples' mailboxes. :)

Hmm. Of course, it's an attempt at a compromise between a private
conversation, and a public dialogue that other people can chip into. Not
a very *good* compromise, necessarily...

(and damn, folding messages together clearly isn't going to work without
spending some serious time on it, so it's back to the cacophony, I'm


Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
"How fleeting are all human passions compared with the massive
continuity of ducks." - Dorothy L. Sayers, "Gaudy Night"
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)