[Doc-SIG] using the same delimiter on the left and right..

Guido van Rossum guido@digicool.com
Thu, 29 Mar 2001 11:48:36 -0500


> > Yuck.  Most of these (except '::') are quite commonly used for other
> > purposes, and occur frequently in examples.
> 
> And the problem with that is?

That they will frequently need to be escaped in order to prevent
special interpretation.  Note that an escape character was absent from
the list -- that's a big mistake, I think!

> > I prefer markup languages with very few special characters,
> > e.g. a GNU doc standard whose name I don't recall
> 
> texinfo
> 
> > which only uses @; or Perl's POD, which seems to get
> > away with making only a letter followed by '<' special.
> 
> Yes, but they are not applicable by the "heavyweight markup won't fly"
> rule (which has been a principle of the Doc-SIG since 1997, and which
> you yourself defended, and which I used to oppose until it was explained
> Very Gently and Lots of Times to me why it was important).

Well, after using ST, I'm not so sure I agree with that rule any
more.  I think HTML is too heavy, but I think reserving a dozen or so
characters for special purposes is also wrong.

> Texinfo (and there are other more modern examples) is still "formal
> markup to produce a document", where the markup has equal status with
> the text, and is expected to intrude. People will not want to write it
> in docstrings. So we'd lose.

But isn't this exactly what Javadoc does?

> Pod is used successfully in the Perl world, and is a clear winner there.
> I find it intensely unreadable, as a lightweight format.

I haven't seen too much POD, so you may be right there.  Is it worse
than Latex?

> One of the precepts of the whole Doc-SIG/docstring thing has been that
> "marked up" text must be readable *as text*. I'll say again what I seem
> to keep saying recently - that means that email is a sensible sort of
> model. If we can successfully parse something close to what people type
> in email, then we're onto a winner, in terms of getting people to use
> it.

Watch out though.  As soon as you're getting into heuristics too much,
our ways part.  I want very clear, exact and predictable rules.

> > Latex has at least three special characters ('\', '{', '}'),
> > and in some contexts more, and that's already a pain.
> > XML with '<' and '&' is borderline for me.
> 
> We already have existing dictarotial fiat (first in 1997, reiterated by
> you again recently) against LaTeX and SGML/HTML/XML. That's a Good
> Thing, since the Doc-SIG as a whole has (each time round the loop)
> agreed that all of these are non-flyers for docstring markup. Their
> individual deficiencies (if so they be - that's a matter for argument
> elsewhere) are thus not relevant.

Sure.  Though I've got a feeling that I'm disagreeing with "the
doc-sig as a whole" a lot.  Maybe I should just withdraw (again) from
this whole discussion and let you all decide what you like, as long as
it doesn't have to be used for the standard library?

> > > Then the only context-dependant characters that remain would
> > > be start-list-item characters..  And if we wanted to, we could
> > > use '* ' at the beginning of any list item, since it's
> > > reserved anyway... something like:
> > >
> > >     * this is an unordered list item
> > >     *1. this is an ordered list item
> >
> > This is OK, although I like the single hyphen form better.
> 
> There was a proposal last time round the loop to start all list item
> "sequences" with a special character (debate obviously ensued on which).
> It was dropped as a proposal (I can't remember which side of the debate
> I started out on - Doc-SIG has had a history of changing my mind towards
> the consensus by reasoned debate - don't you just hate it when that
> happens?).
> 
> On the whole, I oppose it now. It makes it easier for a parser, and much
> harder for a human being, to write text.

I think whitespace (a blank line and/or indentation) should be enough
to recognize the start of a list.

> > > Well.. I'm not sure whether we'd want to do that or not.. We
> > > may be happy with just using '1.' and assuming that no one will
> > > start a line with a number that ends a sentence..
> >
> > That was ST's the original sin.
> 
> Is it a sin? I don't believe that you will get a markup system
> (*whatever* its conventions) that doesn't have *some* nooks and crannies
> where the user may not type. And if we're worried about (important, yes)
> fringe cases like that, why not make the implementation (note, not the
> spec) able to give a warning if it looks like the user might have done
> that (after all, ending a sentence, in *most* cases, can be spotted due
> to puntuation, so it should, often, be feasible).

Well, it would be allright if it only recognized numbers after a blank
line.  It's a pain if it latches on any "^\d+\." in the middle of a
text block, because (in my experience) that's never a numbered item,
it always just happens to be a sentence ending in a number.

> > I can't endorse this yet.
> 
> I am worried that you, Guido, are coming into a debate which you have
> not participated in (note - *that* is not a criticism - there are other
> important things I'd like you to have been spending your time on) and
> putting down some ground rules which *appear* to contradict
> group-wisdom, as derived over the years. I'm a bit uncomfortable with
> having to attempt to "channel" the results of that, given I tend to be
> opinionated anyway, but even so.

Well, you (as a group) asked me my opinion, which I gave.  If you
don't like it, fine, I'll bail out again, I *do* have other things to
do.  Also note that I repeatedly requested to see the spec you (again
as a group) had arrived at, and nobody has pointed me to it.  Given
that the doc-sig has ben going around in circles since 1997, I worry
that it's never going to reach a conclusion -- with or without my
involvement.

> The Doc-SIG has had a disturbing habit of getting *very close* to a
> product, and then just petering out. This seems to partially correlate
> to the aftermath of a Spam meeting (frustrating if one couldn't be
> there), although for entirely different reasons each time, I believe
> (i.e., that's hopefully a red herring).

Lots of things get a jolt of energy at a Python conference (can we
stop calling them spams?) and then peter out.  The types-sig has seen
this phenomenon too.  I guess it's because real life takes over after
a while.

> I'd be very interested to know what you consider your "sticking points"
> on this to be - it may be that they are nothing we would worry about, it
> may be that they are issues we've already argued around in the past.

Show me your spec and I'll review it.  You can't expect me to lay out
ground rules without knowing where your thinking is going.

> For one thing, I'd appreciate *someone* explaining to me, slowly and
> with illustrations, just what is wrong with having context-sensitive
> markup *in docstrings* (not in abstract large documents marked up for
> typesetting (a la TeX), not in data specifications marked up for
> detailed content retrieval (a la SGML), but in docstrings marked up for
> humans to read the markup as text, and for software to retrieve some
> extra information for slightly improved presentation and for slightly
> improved information extraction).

I believe the problem is with the required preciseness of docstrings.
Docstrings are not like email, where the reader can usually guess what
you meant despite typos and transmission glitches.  Imagine a
docstring describing a regular expression-like language.  Can you see
the damage that could be done by inadvertently changing all double
backslashes into single backslashes, or interpreting *...* as bold
(hence dropping the *s)?  There are lots of situations like this.
(E.g. I recently noticed that Ping made some docstring a raw string
because it contained examples involving \r and \n.)

Every character counts, and so does every bit of whitespace -- at
least sometimes, and the docstring processor can't be smart enough to
always know when.

--Guido van Rossum (home page: http://www.python.org/~guido/)