[Doc-SIG] References in the same line as the target text

David Goodger goodger@users.sourceforge.net
Thu, 04 Jul 2002 22:39:16 -0400


Simon Budig wrote:
> First I am not sure if the use of pragmas to change the behaviour is
> a good way to do this. There might be a need for lots of different
> local extensions to the syntax. You'd end up implementing lots of
> pragmas...
> 
> It might be better to have either a pragma that looks like::
> 
>    .. reST-options::
>         :inline-urls: true
>         :math-markup: true
>         :whatever-id: "blah"

Lots of individual directives, or one large pragma directive with
subcommands.  Either way would be fine.  I'd drop the "true" though;
just the presence of the field is enough.

> reST could provide a mechanism to derive for example class names
> from the first field and try to import and plug them into the parser.

Too much magic; potentially dangerous.  Better to have a registry.

> This would also make it easier to avoid having to type this pragma
> by creating customized document processors where you would do
> something like
> 
> parser.add_plugin (InlineUrlPlugin (1))

Yes, something along those lines.  But please don't worry about the
mechanics; it's too early.

> The second point is closely connected to this. When looking at
> Inline markup the parsing work is done by a class "Inliner". This is
> dominated by a huge regular expression that matches to a lot of
> different constructs. In my eyes it would be better to break this
> apart in different regular expressions and test them in a sequence
> (it might be necessary to remember which match starts first). An
> extension could add a regular expression to that list instead of
> having to replace a complicated regular expression with an even more
> complicated regex.

The "Inliner" class has to use one large regular expression.  If we
have some text like this::

    Here is an ``inline **literal**``.

If we check for "strong" (**) first, the result will be wrong.  No
ordering would get it right for all constructs.  We have to check for
each start-string simultaneously, because there are no precedence
rules (almost); first occurrence from left to right in the text is the
determinant.

But that idea is close to the solution I'm thinking of.  My idea is to
break up the one huge regexp into several lists of individual regexps,
one list per construct/regexp type (find start-string only, find the
whole construct, etc.), and join them dynamically into compound
OR-groups, building the large regexp from components at runtime.
Dynamic syntax directives can install new regexps and rebuild the
master regexp.

> Of course this would mean that there *would* be changes to the
> parser itself, but it might result in a more flexible parsing
> framework.

This is the infrastructure support I spoke of.

For now, please just make a subclass of the "Inliner" class and pass
it to the parser.  See the PEP reader for an example.  Don't try to be
fancy, just brute-force copy & paste the code you need from
docutils.parsers.rst.states.Inliner; we'll sort out what needs to be
done afterward.

Please put your code in the sandbox for now (see
http://docutils.sf.net/spec/notes.html#additions-to-docutils).

-- 
David Goodger  <goodger@users.sourceforge.net>  Open-source projects:
  - Python Docutils: http://docutils.sourceforge.net/
    (includes reStructuredText: http://docutils.sf.net/rst.html)
  - The Go Tools Project: http://gotools.sourceforge.net/