[Doc-SIG] using the same delimiter on the left and right..

Tony J Ibbs (Tibs) tony@lsl.co.uk
Thu, 29 Mar 2001 10:16:52 +0100


Guido van Rossum wrote (in response to Edward Loper):
> > So then we would have the following
> > reserved characters, that may not appear in text without
> > being quoted somehow:
> >     '<'    left delimiter for URLs
> >     '>'    right delmiter for URLs
> >     '#'    delimiter for inlines
> >     '`'    delimiter for literals
> >     '*'    delimiter for emph, maybe for strong.
> >     '::'   marker for literal regions

Hmm. Using backtick for literals might work quite well - what was ST's
reason for not so doing, I wonder?

>
> Yuck.  Most of these (except '::') are quite commonly used for other
> purposes, and occur frequently in examples.

And the problem with that is?

> I prefer markup languages with very few special characters,
> e.g. a GNU doc standard whose name I don't recall

texinfo

> which only uses @; or Perl's POD, which seems to get
> away with making only a letter followed by '<' special.

Yes, but they are not applicable by the "heavyweight markup won't fly"
rule (which has been a principle of the Doc-SIG since 1997, and which
you yourself defended, and which I used to oppose until it was explained
Very Gently and Lots of Times to me why it was important).

Texinfo (and there are other more modern examples) is still "formal
markup to produce a document", where the markup has equal status with
the text, and is expected to intrude. People will not want to write it
in docstrings. So we'd lose.

Pod is used successfully in the Perl world, and is a clear winner there.
I find it intensely unreadable, as a lightweight format.

One of the precepts of the whole Doc-SIG/docstring thing has been that
"marked up" text must be readable *as text*. I'll say again what I seem
to keep saying recently - that means that email is a sensible sort of
model. If we can successfully parse something close to what people type
in email, then we're onto a winner, in terms of getting people to use
it.

> Latex has at least three special characters ('\', '{', '}'),
> and in some contexts more, and that's already a pain.
> XML with '<' and '&' is borderline for me.

We already have existing dictarotial fiat (first in 1997, reiterated by
you again recently) against LaTeX and SGML/HTML/XML. That's a Good
Thing, since the Doc-SIG as a whole has (each time round the loop)
agreed that all of these are non-flyers for docstring markup. Their
individual deficiencies (if so they be - that's a matter for argument
elsewhere) are thus not relevant.

> > Then the only context-dependant characters that remain would
> > be start-list-item characters..  And if we wanted to, we could
> > use '* ' at the beginning of any list item, since it's
> > reserved anyway... something like:
> >
> >     * this is an unordered list item
> >     *1. this is an ordered list item
>
> This is OK, although I like the single hyphen form better.

There was a proposal last time round the loop to start all list item
"sequences" with a special character (debate obviously ensued on which).
It was dropped as a proposal (I can't remember which side of the debate
I started out on - Doc-SIG has had a history of changing my mind towards
the consensus by reasoned debate - don't you just hate it when that
happens?).

On the whole, I oppose it now. It makes it easier for a parser, and much
harder for a human being, to write text.

> > Well.. I'm not sure whether we'd want to do that or not.. We
> > may be happy with just using '1.' and assuming that no one will
> > start a line with a number that ends a sentence..
>
> That was ST's the original sin.

Is it a sin? I don't believe that you will get a markup system
(*whatever* its conventions) that doesn't have *some* nooks and crannies
where the user may not type. And if we're worried about (important, yes)
fringe cases like that, why not make the implementation (note, not the
spec) able to give a warning if it looks like the user might have done
that (after all, ending a sentence, in *most* cases, can be spotted due
to puntuation, so it should, often, be feasible).

> I can't endorse this yet.

I am worried that you, Guido, are coming into a debate which you have
not participated in (note - *that* is not a criticism - there are other
important things I'd like you to have been spending your time on) and
putting down some ground rules which *appear* to contradict
group-wisdom, as derived over the years. I'm a bit uncomfortable with
having to attempt to "channel" the results of that, given I tend to be
opinionated anyway, but even so.

The Doc-SIG has had a disturbing habit of getting *very close* to a
product, and then just petering out. This seems to partially correlate
to the aftermath of a Spam meeting (frustrating if one couldn't be
there), although for entirely different reasons each time, I believe
(i.e., that's hopefully a red herring).

I'd be very interested to know what you consider your "sticking points"
on this to be - it may be that they are nothing we would worry about, it
may be that they are issues we've already argued around in the past.

For one thing, I'd appreciate *someone* explaining to me, slowly and
with illustrations, just what is wrong with having context-sensitive
markup *in docstrings* (not in abstract large documents marked up for
typesetting (a la TeX), not in data specifications marked up for
detailed content retrieval (a la SGML), but in docstrings marked up for
humans to read the markup as text, and for software to retrieve some
extra information for slightly improved presentation and for slightly
improved information extraction).

Tibs

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
"How fleeting are all human passions compared with the massive
continuity of ducks." - Dorothy L. Sayers, "Gaudy Night"
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)