[Doc-SIG] formalizing StructuredText

Tony J Ibbs (Tibs) tony@lsl.co.uk
Fri, 23 Mar 2001 10:34:29 -0000

Edward D. Loper wrote:
> > No, within lines spaces are (very carefully) left untouched, just in
> > case.
> That seems inconsistant with stripping trailing whitespace.>

Actually, I agree, although I hadn't thought of it until you said so

> I want
> to make sure that things look "the same" whether you view them in
> text or in html.. So it seems like if you print out a STpy string
> in vt100 mode, with a 130-character-wide screen, it should collapse
> spaces & re-word-wrap, just like HTML/LaTeX would..  Of course, you
> could say that that's just a presentation issue.. But I think that
> for consistancy, we should either:
>     1. Preserve spaces within lines *and* trailing whitespace.
>        Specify that display tools (even plaintext ones) should treat
>        spaces as soft, and should word-wrap.

I remember when most tools would show trailing whitespace visibly. Those
days appear to have gone a long time ago. I'd oppose retaining trailing

>     2. Remove leading/trailing whitespace, and collapse sequences
>        of spaces to single spaces.  Specify that display tools
>        should word-wrap.  (of course, you don't colapse spaces in
>        literal regions, inline regions, literal blocks, or python
>        test blocks.)

Word wrapping is a presentation issue - if the renderer is generating
etext, or STNG, then it may make sense to *not* word wrap.

> (unless, of course, you can give me a good reason why we *should*
> preserve sequences of spaces.

No. It's only laziness. So far as I can see, it doesn't actually make
any different in any circumstance I can see, so there's no point in my
bothering to remove the spaces internally - I don't care about them,
whereas I *have* to care about leading spaces, and I remove trailing
spaces out of kindness (so that '::' works better, for instance - it
doesn't suffer from the "oh dear, that backslash didn't continue my
Python line because there's an invisible space after it" problem).

> > > > > > >     "the following is not a url":<hi>
> > > >
> > > > That's right. In this instance.
> > >
> > > So does it get rendered as is (i.e., with two quote
> signs, one colon
> > > sign, a less than sign, and a greater than sign)?
> >
> > That's up to the renderer.
> But "<hi>" doesn't match the url pattern, so presumably it doesn't
> even get detected by the href-finding-regexp?  As I understand it,
> you can say::

Sorry - I slipped into "if <hi> were a meta-rendition of a URL" mode.
You're right, it would be stored "as is".

> > >   2. nothing can nest within itself (even with intervening levels)
> >
> > Pragmatically has to be true, with non-differentiated start and end
> > quotes.
> It doesn't *have* to be true.. In principle we could allow::
>   *This **is *no* good** for me*

I suppose so, for one definition of how one would parse it (he said

> But I don't think we should.

Luckily we agree!

> > These two seem to me to be the sane minimum, and thus sensible.
> So we'll stick with that for now.
> > > Also, spaces must come between * and ** delimiters, so you
> > > can't say ***this***.
> >
> > Ah, but there's no reason you shouldn't be able to *say
> **this***, for
> > instance (it's quite unambiguous).
> But I thought that regions had to be ended by valid punctuation or
> space?  Does '*' count as valid punctuation, then?  (Of course,
> I expect your regexps to change signifigantly when you try to do
> nesting...)

Hmm. Well, it works:

	text: **SS*ee*** --> rendering <strong>SS*ee*</strong>


> But from a more abstract point of view, I think that '***' will end
> up being too confusing.  I don't think it's unreasonable to require
> that people *say **this** *.  At the very least, it seems much
> easier to read (for those who aren't intimately familiar with ST,
> i.e., our entire user base :) )

But I've already noted you have a cavalier attitude to extra spaces
(pace your HTML) - and I'm not convinced on the "easier to read". Ho

> But I guess that if we are to allow it, I think '***' should only
> be allowed to mean "close both strong and emph" or "open both
> strong and emph".. So you shouldn't be able to say::
>   *Too***confusing**

I just tried it - docutils does:

	text: *ee***SS** --> <em>ee***SS*</em>

which seems reasonable enough.

> to mean::
>   *Too* **confusing**

The latter is clearer, certainly!

> But just to be clear, I don't think we should allow it at all. :)

I *think* it will all come out in the wash, myself.


Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
Well we're safe now....thank God we're in a bowling alley.
- Big Bob (J.T. Walsh) in "Pleasantville"
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)