Parsing of nested tags

Stefan Schwarzer s.schwarzer at ndh.net
Sat Mar 4 00:03:02 CET 2000


Hello :-)

Some time ago I have written one of the zillion programs that read some
kind of format file(s) and make HTML from them. The program is able to
convert <<I italic text>> to <I>italic text</I> or <<LINK link;text>> to
<A HREF="link">text</A>.

However, currently I can't convert <<LINK link;<<I italic text>>>> to
<A HREF="link"><I>italic text</I></A>. The relevant code is

-----8<---------------------------------------------------------------

######################################################################
# perform substitutions
#   <<link url;url_text>>, url_text defaults to url
link_pattern = re.compile( '(?si)<<link (.+?)(?:;(.*?))?>>' )

def make_link( matchobj ):

    url, url_text = matchobj.groups()
    if not url_text:                    # use url as url_text by default
        url_text = url
    url, url_text = map( string.strip, [ url, url_text ] )

    return string.join( (
      html_format.link_format[ 0 ],
      url,
      html_format.link_format[ 1 ],
      url_text,
      html_format.link_format[ 2 ] ), '' )

# evaluate some formatting in the text to legal code
def make_html( text ):
    # order matters, - conversion to links has to be come first
    text = re.sub( link_pattern, make_link, text )
    text = re.sub( r'<<(\S+)\s(.*?)>>', r'<\1>\2</\1>', text )
    text = re.sub( r'(?i)<PROG>(.*?)</PROG>', r'<EM>\1</EM>', text )
    text = re.sub( r'(?i)<FILE>(.*?)</FILE>', r'<EM>\1</EM>', text )
    text = re.sub( r'(?i)<OPT>(.*?)</OPT>', r'<STRONG>\1</STRONG>', text )
    return text

-----8<---------------------------------------------------------------

Now the question: Which is the best way to enable parsing of recursive
parsing as mentioned in the example above?

So far I have thought of two ways. One may be to extend the regular
expression(s), but this is already cumbersome to read. The other
possibility would be to scan the string and replace <<...>> occurences
which don't contain <<, perhaps multiple times, until all patterns are
substituted.

I hope there is an easy way that I simply have overlooked 8-) .
Any suggestions are appreciated. Thank you in advance :) .

Stefan



More information about the Python-list mailing list