[Doc-SIG] Adding new inline markup?

David Goodger goodger@python.org
Tue, 17 Dec 2002 19:54:42 -0500

[Andrew Kuchling]
> RST only supports a few different inline markup notations, such as *
> and ** for emphasis, ` for interpreted things, &c.  For my
> application I'd like to add some more inline markups, such as /cited
> text/.

> [This is probably better handled on docutils-developers, but I'll
> let David make that decision.]

No biggie.  Overlap is inevitable.

> Why can't you use ` for cited text?  Remember that you're allowed to
> have different kinds of interpreted text::
>     :cite:`cited text`

[Andrew Kuchling]
> Oh, I wasn't aware of that!  Thanks!  It would be marginally easier
> for the intended audience if a simpler notation like /cited/ was
> possible, but I can live with using the role notation, and it means
> I don't need to pick more typographic symbols for everything.  (%per
> se% for foreign text, @DARPA@ for acronyms, ad nauseam...)

The intention of interpreted text roles is to allow new inline
descriptive markup, with the simultaneous advantage and disadvantage
of being explicit.  If your application has one "main" role, that can
be the default (i.e. no explicit role required, just `backquotes`).
This area hasn't been explored much nor has any support code been
written.  For example, I'm not sure when to validate roles and process
the interpreted text: in the parser, in the reader, or in a transform.
It could be that the "interpreted" element may disappear from the
Docutils internal doctree, just as the "directive" element did.

> For acronyms use | (pipe), assuming you want it expanded.

Pipes are used for |substitutions|, which are like inline directives,
allowing graphics and arbitrary constructs within text.  Replacing an
acronym with its full text is one application.  See

[Andrew Kuchling]
> Should I expect to be able to subclass Inliner in order to add new
> notations?

If necessary, yes, but it hasn't been necessary yet so that
functionality hasn't been added (XP's "add no functionality before its

> Right now that's rather messy.  Inliner.dispatch is a
> dictionary mapping symbols to handler methods, but the large regular
> expression stored as Inliner.parts doesn't take this into account.
> A possible way of handling this would be to add an internal method
> _get_initial_pattern() to the Inliner class that synthesized the
> 'parts' regular expression, using self.dispatch.keys() to match all
> the listed inline markups.

Way ahead of you ;).  Look again, and you'll see that
``Inliner.parts`` isn't a regexp, it's a data structure that's used to
synthesize a regexp.  ``Inliner.patterns.initial`` is built by the
``build_regexp`` function (which see for a description of the data
structure).  This issue *has* come up before, WRT embedded URIs, and
although the support wasn't used for that, it did simplify the regular
expression (you should've seen it before!).

A subclass should be able to extend (or replace) this data structure
and re-synthesize the regexp.

> Inliner is fairly complicated, though, so maybe there are additional
> changes that would be necessary.

Probably :).  Limitations are often discovered when the code is
exercised in novel and interesting ways.

David Goodger  <goodger@python.org>  Open-source projects:
  - Python Docutils: http://docutils.sourceforge.net/
    (includes reStructuredText: http://docutils.sf.net/rst.html)
  - The Go Tools Project: http://gotools.sourceforge.net/