[Doc-SIG] Alternative inline markup

Tue, 06 Nov 2001 23:31:11 -0500

[Alan]
> Here are my suggested changes to the current inline markup system.

Thanks for your input. I wish it had come earlier though! If you've
been following the \*-checkins lists, you've noticed that I've already
written the specs and checked in the implementation of::

    Substitutions: `/text/` `/picture/`.

    .. /text/ If you're happy and you know it
    .. /picture/ image:: clapping_hands.png

Except for Tony's neutral note, I haven't seen any reaction to this
construct & syntax yet (posted last week as "Inline Substitutions").
Alan?

Oh well. All I have to do is make a concerted effort to be objective
when reviewing the suggestions. [Wrenching noise due to ego
separation.] OK, done.

> 1) Inline markup can be nested::
...
>    Ambiguity is resolved with close tags first, then maximal munch.

I don't follow. Clarify please?

The easiest way I see to implement this is to first identify the outer
inline markup as we do now, then recursively scan for nested inline
markup. It won't work in the general case though, as explained below.
Changing the parse algorithm to be fully nested-inline-markup-friendly
could be difficult and/or ambiguous. I'm not sure the general case
*can* be done, without a lot of exceptions and special rules, which
complexity I'm not willing to add to reStructuredText.

>    If you are sick enough to try::
> 
>       ***Strong enclosing emphasis***
>       **Strong enclosing *emphasis***
>       *Emphasis enclosing **strong***
> 
>    then the first two will work and the third won't.

Actually, all of those would work with the outer-to-inner recursive
algorithm. The current definition of inline markup treats as
significant the whitespace, punctuation, or bracketing before the
start-string and after the end-string. These would work too::

    **Strong enclosing *emphasis* in the middle.**
    ***Emphasis* inside strong.**

These wouldn't work with the current definition though::

    *Emphasis enclosing **strong** in the middle.*
    ***Strong** inside emphasis.*

For the first example, the last asterisk of the closing "**" of
"strong" would be recognized as a closing "*". For the second, strong
emphasis is recognized first, without lookahead; the "*" after
"emphasis." wouldn't be significant.

We'd have to refine/redefine the algoritm to work with such cases.

>    If the user expects any of them to work without consulting
>    documentation, they're foolish.

With a little bit of experience, I'd say such expectations would be
common. Nested inline markup should be obvious and orthogonal in the
general case, or nonexistent.

>    Besides, if you want the third, you can do::
> 
>       `Emphasis enclosing **strong**`__
> 
>        __ emphasis

This goes against the design goals of reStructuredText. So, not on my
watch. ;-)

> 2) An underscore suffix currently modifies the preceding text by
>    making it a link. This notion is extended - the suffix indicates
>    that the text is to be tagged in some way, indicated by a
>    directive or destination URL in the target::
> 
>       I had lunch with Jonathan_ today.  We talked about Zope_.
> 
>       .. _Jonathan: lj [user=jhl]
>       .. _Zope: http://www.zope.org/

Interesting idea, putting arbitrary constructs in the link target.
However, for consistency that depends on two things:

1. The link text remains behind, untouched except for being
   "activated" in some way.
2. There must *be* a link target. Corollary: the reference must *be*
   a reference.

What will "Jonathan" become? A clickable hyperlink to something? Or a
user image from the database? For example, say I want Jonathan's user
icon to appear in my paragraph::

    I had lunch with [Jonathan's icon here] today.

How do I do this *without* having a hyperlink at the same time?

On the other hand, we could say that the trailing-underscore syntax
doesn't signify a hyperlink reference, but only indicates a "tagging
reference". A tagging reference becomes a hyperlink reference if the
contents of the "tag" resolve to a hyperlink. And how do we do the
straight icon-substitution example? Would we *replace* the reference
text depending on the contents of the "tag"? This seems too indirect
and complicated for easy comprehension. It's too much.

I've asked this before, and I really would like to know: what does the
"lj" tag *do* in the end? Can you show us some HTML output?

>    Link targets which are also legal directive names must be
>    enclosed in backquotes.

The frequency of link targets would far outweigh directives, so
markup would suffer from extra syntax on targets.

I thought of this alternative syntax::

    I had lunch with Jonathan_ today.  We talked about Zope_.

    .. _Jonathan: lj:: user=jhl
    .. _Zope: http://www.zope.org/

But it suffers from the same conceptual problems: the reference in the
text sometimes become links, sometimes not, we don't know *at the
markup*.

> 3) Substitution becomes a directive:

... combined with a hyperlink target. No, I don't think so.
Substitutions are going to be relatively rare, and should have distict
syntax.

The syntax you're proposing is internally inconsistent.

> 4) Inside markup delimited by backquotes or curly braces, curly
>    braces may be used as delimiters equivalent to backquotes::
...
>    This is because backquotes don't nest.

There's no difference between backquotes and asterisks with regard to
nesting. Unless you're referring to double-backquotes: ``no further
processing of `backquotes` in inline literals``?

Why the fixation on curly braces? :>

> 5) Roles can go away.  We don't need them.  Optionally if we want
>    the ability to put short directive names inline, we could
>    declare ::
> 
>       `foo:: bar bar bar`

Similar syntax has already been considered and rejected. See
http://structuredtext.sf.net/spec/alternatives.txt, "Interpreted Text
'Roles'" alternative 1.

> Summary:
> 
> - We gain nesting.

Not without significant work, though. If it's even possible
unambiguously, it can be added independently later.

> - We gain arbitrary extensibility of inline markup.
> - We gain substitutions.

But at the expense of complicating hyperlinks.

> - We retain unobtrusive markup.

Debatable, since it adds complexity to the underlying concepts.

> - We lose by occasionally having to escape a curly brace inside
>   backquotes, or quote a hyperlink target with no ``/`` or ``#``
>   characters to distinguish it from a directive name.

That last one is a significant loss.

[Tony]
> Immediate off-the-cuff comments - but for inline markup usage, I
> think that's actually what one *wants*...

I don't follow.

> 1. We want a *simple* markup scheme, so it is easy to
>    learn and easy to remember (all of). I think David's
>    latest suggestion about quoted slashmarky things
>    (about which I am undecided) is pushing the very
>    edge of that. *Unless* Alan's scheme *does* simplify
>    overall, that added complexity makes it a non-starter
>    for me.

Which added complexity?

> 2. We do *not* have to get the whole thing right at
>    the start, so long as any extensions/additions
>    can be carefully added at a later stage. For this
>    sort of purpose, having things like directives,
>    roles, and so on, is a *good* thing - they allow
>    one to extend without changing the format.
>    (i.e., losing roles isn't necessarily such a
>    good thing as it sounds).

They're meant as a last resort anyhow. Substitutions provide more
flexibility with less obtrusive markup, albeit indirectly.

> 3. Previous rounds of the Doc-SIG have died partly
>    because people kept trying to jam things in.
>    (which isn't to say one shouldn't try to get it
>    right, but I just get a rather uncomfortable
>    feeling).

Substitutions (or equivalent, whatever the syntax) filled a gap in the
reStructuredText specification. Directives allow arbitrary block-level
structures. Substitutions allow arbitrary text-level (inline)
structures. Without them, every time someone wants a specific new
inline structure, they'd have to petition for a syntax change. With
them plus existing syntax, any inline structure can be coded without
new syntax (or with only directive-local syntax). This was the goal of
interpreted text roles also, but roles have limited functionality and
the syntax is obtrusive. They're most useful if the role can be
inferred by the system.

The provisional syntax I've chosen for substitutions isn't
particularly elegant, but that's OK: I don't expect substitutions to
be used often enough to be painfully noticeable. On the contrary, I
think noticeable syntax is appropriate for this construct.

[Paul]
> To me, this is a serious uncomfortable feeling. I think reST *as it
> stands, right now* is "just right". I'm emphatically **not** saying
> that it is perfect for all application areas. But we'll break it in
> the attempt to "make it better" if we go *any* further. (IMHO)

With the addition of substitutions, I consider reStructuredText to be
pretty much complete. There are a couple of details remaining in
rst-notes.txt (multi-line titles, an external hyperlink mechanism with
a finer resolution, and ``\ `` as non-breaking space), but they're not
significant in the grand scheme.

> If people use it "naturally", it will start turning up in the oddest
> places[1]_[2]_.

A usage note: footnote references require a preceding space (or
brackets, etc.).

>        (I intend to start posting in reST on other lists, to
>        see the reaction...).

Good idea. How about adding a line to our signatures? ::

    Marked up with reStructuredText: http://structuredtext.sf.net/

> The Perl people (based on Larry Wall's thinking) tend to talk in
> terms of "memes", and view ideas as existing in a space where there
> is some sort of natural selection. In that context, I'd like to see
> reST taking over the ecological niche currently held by
> ``_underline_`` and ``*bold*`` (and "oh, heck - I don't know how to
> lay this out" :-)) That means that people need to be able to use it
> "by example" - just looking at other people's markup, and getting it
> right without manuals.

I think that's a possibility with the simpler, more often-used parts
of the markup.

> Tibs' refcard is the *absolute maximum* level of documentation that
> can be expected to capture this audience.

Actually, I think the quickref needs to be broken up (at least
internally) into "basic" and "advanced" parts, to make the
introduction easier. Perhaps each construct with an advanced aspect
should have an "Advanced Usage" subsection.

> (A pure-Python implementation might be better, but I'm getting
> bogged down in technical issues over tree walks, and as I say, I
> want it now...)

I'm about ready to put the parser to bed; enough fiddling already. It
does need lots of internal documentation and some refactoring, but
functionally it's complete. The next thing is to tackle a Reader
component and the transforms (including Ueli Schlaepfer's patch).

> This went on too long. But I'm convinced - we should freeze the
> design now, and work on implementation. We can't hit a moving
> target. (At the very least, the DOM needs to be frozen so that work
> on output has a firm basis...)

I agree with the sentiment, but even if the parser is frozen the
document tree model is still subject to change. Its functionality is
only construction-oriented (parser) now; as you well know, there's not
much support for transformations, tree walking, and whatever else
output needs.

I also want to take a good look at HappyDoc and others before
reinventing any more wheels.

[Tony]
> Yep, I'd agree (although bear in mind that *David* has *got* just
> about all of it implemented - people like me are the slow coaches -
> hmm, maybe we should insist he does documentation instead of
> coding!!!!)

And how are you going to make me? Coding is much more fun!

> Hmm - should we aim for a formal release (alpha? beta?) early in
> the new year? From here it looks doable...

If it's ready, we will. I won't commit any further than that.

-- 
David Goodger    goodger@users.sourceforge.net
Marked up with reStructuredText: http://structuredtext.sf.net/