[Doc-SIG] Summary of reference/target syntaxes

David Goodger goodger@users.sourceforge.net
Wed, 10 Jul 2002 22:19:36 -0400


Here is a summary of the current and proposed hyperlink syntaxes, with
concrete examples, and my current thinking.

1. Named hyperlinks (in current reStructuredText)::

       This is a named reference_ of one word ("reference").  Here is
       a `phrase reference`_.  Phrase references may even cross `line
       boundaries`_.

       .. _reference: http://www.example.org/reference/
       .. _phrase reference: http://www.example.org/phrase_reference/
       .. _line boundaries: http://www.example.org/line_boundaries/

   Advantages: 

   - The plaintext is readable.
   - Each target may be reused multiple times (e.g., just write
     "reference_" again).
   - No syncronized ordering of references and targets is necessary.

   Disadvantages:

   - The reference text must be repeated as target names; could lead
     to mistakes.
   - The target URLs may be located far from the references, and hard
     to find in the plaintext.

2. Anonymous hyperlinks (in current reStructuredText)::

       This is a named reference__ of one word ("reference").  Here is
       a `phrase reference`__.  Phrase references may even cross `line
       boundaries`__.

       __ http://www.example.org/reference/
       __ http://www.example.org/phrase_reference/
       __ http://www.example.org/line_boundaries/

   Advantages: 

   - The plaintext is readable.
   - The reference text does not have to be repeated.

   Disadvantages:

   - References and targets must be kept in sync.
   - Targets cannot be reused.
   - The target URLs may be located far from the references.

3. The proposed inline external target syntax::

       This is a named reference__ __<http://www.example.org/
       reference/> of one word ("reference").  Here is a `phrase
       reference`__ __<http://www.example.org/phrase_reference/>.

   Advantages: 

   - The target is specified immediately adjacent to the reference,
     improving maintainability:

     - References and targets are easily kept in sync.
     - The reference text does not have to be repeated.

   Disadvantages:

   - Poor plaintext readability.
   - Targets cannot be reused (but see below).

   To alleviate the readability issue slightly, we could allow the
   target to appear later, such as after the end of the sentence::

       This is a named reference__ of one word ("reference").
       __<http://www.example.org/reference/>  Here is a `phrase
       reference`__.  __<http://www.example.org/phrase_reference/>

   This could only work for one reference at a time (reference/target
   pairs must be proximate [refA trgA refB trgB], not interleaved
   [refA refB trgA trgB] or nested [refA refB trgB trgA]).  Perhaps
   this restriction is too onerous; then references and targets would
   have to be imediately adjacent.

   The above syntax is actually for "anonymous inline external
   targets", emphasized by the double underscores.  It follows that
   single trailing & leading underscores would lead to implicitly
   named inline external targets.  This would allow the reuse of
   targets by name.  So after "reference_ _<target>", another
   "reference_" would point to the same target.

4. If it is best for references and inline external targets to be
   immediately adjacent, they might as well be integrated.  Here's an
   alternative syntax embedding the target URL in the reference::

       This is a named `reference <http://www.example.org/reference
       />`__ of one word ("reference").  Here is a `phrase reference
       <http://www.example.org/phrase_reference/>`__.

   Advantages and disadvantages are the same as in (3).  Readability
   is still an issue, but the syntax is a bit less heavyweight.

   There's a problem with this syntax: how to refer to a title like
   "HTML Anchors: <a>" (ending with an HTML/SGML/XML tag)?  We could
   either require more syntax on the target (like "`reference text
   __<http://example.com/>`__"), or require the odd conflicting title
   to be escaped (like "`HTML Anchors: \<a>`__").  The latter seems
   preferable.

   Similarly to (3) above, a single trailing underscore would convert
   the reference & inline external target from anonymous to implicitly
   named, allowing reuse of targets by name.

5. For comparison and historical background, StructuredText has two
   syntaxes for hyperlinks.  First, ``"reference text":URL``::

       This is a named "reference":http://www.example.org/reference/
       of one word ("reference").  Here is a "phrase
       reference":http://www.example.org/phrase_reference/.

   Second, ``"reference text", http://example.com/absolute_URL``::

       This is a named "reference", http://www.example.org/reference/
       of one word ("reference").  Here is a "phrase reference",
       http://www.example.org/phrase_reference/.

   Advantages: 

   - The target is specified immediately adjacent to the reference.

   Disadvantages:

   - Poor plaintext readability.
   - Targets cannot be reused.
   - Both syntaxes use double quotes, common in ordinary text.
   - In the first syntax, the URL and the last word are stuck
     together, exacerbating the line wrap problem.
   - The second syntax is too magical; text could easily be written
     that way by accident (although only absolute URLs are recognized
     here, perhaps because of the potential for ambiguity).

With any kind of inline external target syntax it comes down to the
conflict between maintainability and plaintext readability.  I don't
see a major problem with reStructuredText's maintainability, and I
don't want to sacrifice plaintext readability to "improve" it.

The proponents of inline external targets want them for easily
maintainable web pages.  The arguments go something like this:

- Named hyperlinks are difficult to maintain because the reference
  text is duplicated as the target name.

  To which I said, "So use anonymous hyperlinks."

- Anonymous hyperlinks are difficult to maintain becuase the
  references and targets have to be kept in sync.

  "So keep the targets close to the references, grouped after each
  paragraph.  Maintenance is trivial."

- But targets grouped after paragraphs break the flow of text.

  "Surely less than URLs embedded in the text!  And if the intent is
  to produce web pages, not readable plaintext, then who cares about
  the flow of text?"

As is probably obvious, I'm ambivalent/against the proposed "inline
external targets".  I value reStructuredText's readability very
highly, and although it may add some convenience, the "inline external
target" syntax(es) compromise that readability IMO.  Unless something
changes (better syntax, new & better arguments and/or use cases), the
best result this proposal can hope for is inclusion as "experimental
syntax" via a pragma directive.

-- 
David Goodger  <goodger@users.sourceforge.net>  Open-source projects:
  - Python Docutils: http://docutils.sourceforge.net/
    (includes reStructuredText: http://docutils.sf.net/rst.html)
  - The Go Tools Project: http://gotools.sourceforge.net/