[Doc-SIG] rST hyperlink syntax

Alan Jaffray jaffray@pobox.com
Wed, 17 Oct 2001 19:37:02 -0400 (EDT)

> >     __ http://mail.python.org/pipermail/doc-sig/
> Orthogonality again: two ways of spelling the same thing. I think the only
> way such a simplification would be acceptable is if it simplified the
> general case, thus becoming the one and only spelling.

We can use the leading ``__ `` to improve other cases.  The leading
``.. `` is currently heavily overloaded and somewhat ambiguous::

    .. __: http://somewhere       anonymous uri
    .. _blah: http://somewhere    named uri
    .. _blah:                     anchor
    .. [blah] http://somewhere    footnote
    .. blah: http://somewhere     comment
    .. blah:: http://somewhere    directive

Anchors and named URIs are entirely different - one is a marker of a
particular position in the document, the other associates a name with
an external resource and has identical semantics regardless of its
position in the document - yet they have the same syntax.  Comments
are freeform, unless they happen to start with an underscore, in which
case they can't contain a colon on the first line.  All of them start
with a comment syntax intended to imply "hidden", except that footnotes
usually appear in visible output, and link URLs sometimes do.

IMHO, the attempt to cram these various constructs into ``.. `` seems
like a mess, with very little advantage.  If we ditch that requirement::

    __ http://somewhere           anonymous uri
    __ blah: http://somewhere     named uri
    __ _blah                      anchor
    __ [blah] blah blah           footnote
    .. blah: http://somewhere     comment
    .. directive:: foo            directive

Anonymous URIs are much shorter.  Named targets are shorter and more
readable.  Anchors are shorter, no longer look like refuris, and don't 
have the misleading colon which leads one to expect another argument,
containment of a block, or both.  Everything which can be linked to
starts with ``__ ``, and we've strongly established underscore as 
meaning "link" (as opposed to ``.. `` which means "hidden"), so we've
improved the mnemonic value.  ``.. `` is less overloaded, so comments
are less constrained by potential ambiguity with other constructs.

There's a potential ambiguity between anchors and anonymous URIs starting
with an underscore or square bracket.  The URI would have to be escaped
in that case; no big deal.  There's potential for human confusion between
anonymous and named URIs, but cases where this is even a possibility seem
extremely unlikely in practice.  (I suppose you could overlook the lack of
a space and mistake the anonymous URI ``news:comp.lang.python`` for the
URI ``comp.lang.python`` named ``news``, but that seems highly unlikely,
and that's the worst realistic example I can construct.)

I'll confess that I don't buy the argument that "allowing explicit markup
to start with something other than ``.. `` adds complexity".  Keeping the
same number of constructs while making some of them simpler seems like a
reduction of complexity to me, and I don't think consistency has been
harmed.  But even if you do believe it's slightly more complex, I think
there are enough advantages to justify it.

I'd actually prefer a still different syntax, but it's further out in
left field, since it requires changing some connotations. ::

    .. http://somewhere           anonymous uri
    .. blah: http://somewhere     named uri
    .. _blah                      anchor
    .. [blah] blah blah           footnote
    ## blah: http://somewhere     comment
    !! directive: foo             directive

``##`` strongly evokes "comment" for those familiar with any of a variety
of scripting languages.  It doesn't look hidden, but having comments look
hidden is a mixed blessing -- IMHO comments *should* jump out at you.
This isn't an uncommon common stylistic preference; for evidence, note
that ``#`` and ``/* */`` and ``<!-- -->`` are fairly aggressive characters,
and many text editors default to displaying comments in bright red or 
otherwise emphasized text.  Relatively few languages use low-key markup
like ``..`` or ``--`` for comments.

``!!`` is intended to evoke "something odd or surprising is happening here".
It may also evoke shell escapes for a few hard-core Unix folks.  By separating
out comments from directives, we remove ambiguity, and directives can lose
their wartlike double-colon.

``.. `` loses the "hidden" meaning and instead means "leading up to" or
"side note", and is thus used for targets.  I believe it evokes this as
strongly as "hidden" in uninitiated users.  (At least, it did so for me.
Other people I've asked seem to have no resonance with any of these
associations.)  It's cleaner to read and easier to type than either
``__ `` or various constructs starting with ``.. _``, which is a
significant advantage given its frequency...

...alternately, we could lose ``.. `` entirely.  It has ambiguity issues,
as in this paragraph, and ``__ `` does have the edge in mnemonic value
since we're already using underscore for links.

(regarding anonymous hyperlinks)
> An interesting idea, yes, but I'm not sure about its frequency. It wasn't
> missed until now.

Keep in mind that you're talking to a very unusual user base, if your
goals for reST include having it be a general-purpose language for short
web pages and other documents.

Program documentation tends to be dense on certain types of markup - like
lists, for example, which is part of why you have five kinds of lists
defined in the core spec - and light on other kinds - like hyperlinks,
especially hyperlink targets with long link text, since much what you're
linking on are things like variable and function names, which tend to be
short and which will probably have implicit targets defined through other
document structure.

I did some grepping through the reST distribution and my department's 
intranet website, looking at usage patterns.  It's interesting.

|           |   Size   | List items |  Targets  |
| reST docs |   198K   |     469    |     39    |
|  website  |   794K   |    1779    |   1355    |

We use explicit link targets about **10 times** as heavily as you do. [#]
The link texts were generally longer as well.  This may explain our 
relative levels of concern about the current target syntax.

.. [#] These numbers are after factoring out multiple links within
   a document pointing to the same target, which would be written 
   as a single target in reST.

People have a lot of different usage patterns.  A group of programmers -
let alone a group of programmers of a particular language in a particular
interest group - will not exhibit all the patterns that will come up in
a broader group.  My guess is that if you were, say, talking to a Zope
SIG and trying to replace StructuredText as a standard markup for Zope,
rather than talking to a Python SIG and trying to institute a standard
for docstrings, you would have heard more concerns about target syntax
by now.

> In our realm of human-language text parsing, it's hard to balance the
> contradictory desires for simplicity, clarity, consistency, and
> orthogonality. Something's gotta give. A little education is all that's
> necessary to get past the less-than-ideal bits. A lot less education
> than if you standardized on HTML though!

Yeah, but they already *know* HTML... (*sigh*)  Everyone and their monkey
knows HTML.

We've got a couple dozen people who know stuff that should be written down
but are too busy to do it.  We want to encourage them to write this stuff
down.  Giving them something which makes doing it quicker and less annoying,
with the nontrivial side benefits of providing richer structure for tools
to index and readable plaintext for use in applications that need it,
would be a huge win.

Unfortunately this means we're limited to about a two-minute learning curve,
and tolerance for anything that's less convenient or efficient than what
they already know will be extremely low.  I showed STX to a couple of them,
and one of the first things they commented on as being really cool was the
hyperlink syntax.  They use links all the time, and writing and reading ::

    "Python home page":http://www.python.org

instead of ::

    <a href="http://www.python.org">Python home page<a/>

is a clear and tangible win.  (Even if they didn't accidentally transpose
the "a" and the "/" in the closing tag thus accidentally including the
rest of the document in the link and requiring them to curse a lot and 
then go back and fix it. :-) )

Of course, this particular syntax also leads to problems further down
the road, which is why I'm talking with you rather than just using STX.
I could probably sell them on ::

    `Python home page`__

    __ http://www.python.org

instead, but ::

    `Python home page`__

    .. __: http://www.python.org

is not a clear win, and

    `Python home page`_

    .. _Python home page: http://www.python.org

is right out.  We'll have cases where named links will be useful, but we'll
have many other cases where names are pointless, and I believe the latter
are a large majority for most of our docs.

For what it's worth, the other place I'm hoping to use reST has even *less*
patient and trainable users.  It's an online journal and discussion site.
There are a few hundred thousand users, and the median user is a 16-year-old
girl with no technical skills (except some basic HTML) who spends a couple
of hours a week on the site complaining that no one really understands her.
But reST is still very close to being a perfect fit for the application.

I'd also like to see a clearly-specified markup completely displace STX,
ending STX `dialect proliferation`_ in Zope once and for all.  reST won't
do this unless it's a clear win for all current STX users, which includes
eliminating all areas in which reST is more awkward than STX.  As far as
I can see, hyperlink syntax is the only such area.



.. _dialect proliferation: