[Doc-SIG] nested inline markup (was RE: Alternative inline markup)

David Goodger goodger@users.sourceforge.net
Fri, 09 Nov 2001 22:23:16 -0500


[I'm splitting up the discussions because they really are independent
issues. Keeping up one thread is confusing. ("My brain hurts!") Please
keep the threads separated, or my brain will explode. Thanks.

("Oh my God he's burst his brain!")

This post contains replies to a bunch of other posts. I've tried to
pull them into some semblance of coherence, but there may be
redundancy, ramblings, and misquotes. Apologies in advance.]

[Alan]
> Nesting is a fundamental feature. It's not going to become easier to
> add it later. It's going to become more difficult. Meanwhile,
> attempts to get around the need to add it will complicate and
> clutter the language, while adding it now can simplify matters.

The problem is that in the what-you-see-is-more-or-less-what-you-get
markup language that is reStructuredText, the symbols used for inline
markup ("*", "**", "`", "``", etc.) may preclude nesting. I've thought
over how we might implement nested inline markup. The first algorithm
("first identify the outer inline markup as we do now, then
recursively scan for nested inline markup") won't work;
counterexamples were given in my last post.

The second algorithm makes my head hurt::

    while 1:
        scan for start-string
        if found:
            push on stack
            scan for start or end string
            if new start string found:
                recurse
            elif matching end string found:
                pop stack
            elif non-matching end string found:
                if its a markup error:
                    generate warning
                elif the initial start-string was misinterpreted:
                    # e.g. in this case: ***strong** in emphasis*
                    restart with the other interpretation
                    # but it might be several layers back ...
        ...

This is similar to how the parser does section title recognition, but
sections are much more regular and deterministic.

Bottom line is, I don't think the benefits are worth the effort, even
if it is possible. I'm not going to try to write the code, at least
not now. If somebody codes up a consistent, working, general solution,
I'll be happy to consider it.

[Paul]
> I probably agree here - nesting is a fundamental issue. It's just
> that I disagree that that fact makes it necessary to support it. On
> the contrary, I'd say that lack of nesting is a distinguishing,
> simplifying, feature of the design.
> 
> You can't get around it - the language doesn't support nesting, and
> unless that is changed, it means that it simply isn't *possible* to
> use emphasized strong text in reST. More relevantly, it means that
> you can't emphasize parts of a hyperlink. This is a more realistic
> requirement, but I *still* don't see it as so earth-shattering that
> we have to accept either inconsistent or complex nesting rules (all
> options I've seen so far are one or the other of these...) just to
> support it.

Well said.

[Paul]
> That makes the nesting rules dependent on the markup characters
> involved. That's a *very* odd distinction to make - it implies that
> there is an argument for changing the markup for strong text to, say
> ``!strong!``, as that makes it nestable (!!).

Good point.

[Alan]
> > Do we lose anything? No. In the current spec you can't nest at
> > all. In the proposed spec there are happier alternatives for all
> > three, since ``*`` and ``**`` become mere sugar for tagged
> > content.

[Paul]
> We lose consistency, which is what I am arguing is crucial.

I agree.

> > The reason I said "sick" is because I don't know a semantic
> > meaning for "emphasized strong text" other than "the author
> > wants to demonstrate a case where nesting is difficult to
> > parse". :-)
> 
> Bold italic, in most web browsers. That's not to say I feel the need
> to support it, but it's a *perfectly* sensible thing.

We could add a new inline markup construct: ***strongemph***. Still
not nestable, but it gives us bold-italic. Then we could add
****flourescent****, *****blinking*****, and ******SCREAMING****** as
well. Or use asterisk strings whose lengths are powers of 2:
****flourescent****, ********blinking********,
****************SCREAMING****************. This would allow arbitrary
combinations (5 asterisks means flourescent emphasis, 26 means strong
blinking screaming, etc.).

But we won't.

[Alan]
> That's not semantics or structure, it's presentation. Honestly, I
> don't mind the idea of having "bold" and "italic" tags in the
> language, but if structural purity is a goal, then we shouldn't
> treat "emphasis" and "strong emphasis" as euphemisms for "italic"
> and "bold". If "emphasized strong emphasis" has a meaning, it's not
> a terribly important one. :-)

That's a debate that has raged for years and never been properly
resolved as far as I know. My take on it is, "emphasis" is often
typographically represented by italics, but it could just as easily be
represented by the colour red or blinking or a different typeface or
by *asterisks*. The term "emphasis" doesn't tie you to any one
representation, so you're free to choose what suits best. Same for
"strong".

[Paul]
> But we disagree on whether people should be able to write reST
> without reading the spec. I believe that things should be deducible
> from examples, you feel that attempting to use markup without
> knowing the rules is foolish.

Learning by example without reference to the spec (or at least the
quickref) is possible, up to a point. Once the more advanced features
are encountered and used, the reference materials are a necessity.

> .. [1] By the way, you do realise that in advocating nesting,
>        you are making the construct::
> 
>            ```attribute:: Fred```
> 
>        which you just used, illegal?

No, it's the inline equivalent of a literal block:

    ::

        `attribute:: Fred`

>        At the moment it is a literal display of markup. What would
>        it be with nesting?

Same. Inline literals ("``") explicitly do no further processing of
their contents.

> .. [2] Boy, it's hard to discuss markup using marked up text...

Use inline literals or (especially in this case) literal blocks. And
when inline markup start-strings are not in a start-string context,
they're not recognized anyway. That's a feature!

[Alan]
> Is "markup can't be delimited with the same character as its parent"
> too complicated?

No, but it is limiting.

[Alan]
> > > 4) Inside markup delimited by backquotes or curly braces, curly
> > >    braces may be used as delimiters equivalent to backquotes::
> > ...
> > >    This is because backquotes don't nest.

[David]
> > There's no difference between backquotes and asterisks with regard
> > to nesting.

[Alan]
> True. Asterisks don't nest either. :-) I guess I wasn't clear. I
> should have said "inside tagged content, curly braces may be used to
> delimit tagged content". I'm referring solely to::
> 
>     `Putting {a tag}_ inside another tag`_

This is too much. -1 (just on braces for tags in tags, independent of
the "tagging reference" proposal)

[Alan]
> > > Summary:
> > > 
> > > - We gain nesting.

[David]
> > Not without significant work, though. If it's even possible
> > unambiguously, it can be added independently later.

[Alan]
> I don't mind writing code, but I'd rather not fork to do it.

No fork necessary. If it can be done cleanly, and if you or someone
else does code it, we'll incorporate it into the parser.

> Can we at least add support for inline nested markup to the DOM
> before freezing, even if the current parser doesn't support it, so
> those of us who want to add it can do so without breaking everything
> in existence? Surely that wouldn't be too difficult.

Not difficult, and valid for other markup syntaxes... done. See
"Inline Elements" in http://docstring.sf.net/spec/gpdi.dtd, especially
read the caveats.

[Tony]
> As to adding the ability to the DTD, so that other implementations
> can do it (was that what Alan meant?)

I've added the ability to the DTD so that other *markups* can do
inline nesting, possibly reStructuredText as well. The DTD represents
the internal data structure, not the input markup. (Of course, I've
modified it based on the needs of reStructuredText, the only input
markup available. That, combined with my experience, intuition, and
desire for a generic data structure, has shaped the DTD.)

[Alan]
> I think being able to link anything - emphasized text, class and
> attribute names, other inline interpreted/tagged text - is a basic
> need. I don't think "you can't put emphasis inside strong emphasis"
> is too great an inconsistency to accept to support the vast majority
> of nesting cases.

It may be a basic need, but for now at least, it is a basic limitation
of reStructuredText that you can't do it.

[Alan]
> And yeah, it can be added later, but I *really* don't want to head
> down the road of writing reStructuredTextWithNesting while someone
> else writes reStructuredTextWithNestingDoneSomewhatDifferent and
> someone else writing reStructuredTextWithChocolateSprinkles and the
> whole incompatible dialect-proliferation fiasco that STX has gotten
> itself into.

I think the main reason for the StructuredText dialect proliferation,
which reStructuredText hopes to put an end to (but may be seen as just
another example of), is that there was no easy way to add your own
little extensions. reStructuredText is much richer in terms of
supplied constructs, and thanks to directives and substitutions
(thanks to Alan for lighting the fire under that last one),
application-specific extensions *are* easy to add.

[Tony]
> And my favourite example of why nested inline markup is going to be
> difficult to decide is now::
> 
>     ```something```
> 
> - is that an interpreted literal, or a literal with single
> backquotes at each end?

Something implicit in the parser and never spelled out explicitly in
the spec (will do so), is the "order of operations" for resolving
inline markup. "``" and "`/" and "_`" are checked before "`"; "**" is
checked before "*"; standalone URIs are checked for last of all.

So the above example resolves to::

    <literal>
        `something`

> And if it is one, how does one get the other?

It seems to be impossible to get::

    <interpreted>
        <literal>
            something

Although::

    `were nesting allowed, ``something`` like this would be OK`

might result in::

    <interpreted>
        were nexting allowed,
        <literal>
            something
         like this would be OK

The text between the interpreted start-string (`) and the literal
start-string (``) would be enough to differentiate. It's the same
problem as ``***emphasized strong or strong emphasis?***``, except
that unlike emphasis & strong, interpreted & literal are not
commutable.

-- 
David Goodger    goodger@users.sourceforge.net    Open-source projects:
 - Python Docstring Processing System: http://docstring.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net
 - The Go Tools Project: http://gotools.sourceforge.net