[Tutor] Regular Expression guru saught

Jeff Shannon jeff@ccvcorp.com
Mon Aug 4 23:04:02 EDT 2003


Kirk Bailey wrote:

> ok, here's the rub;
> I am writing a wiki. A wiki stores the body of a page in a flat text 
> file. NO HTML. The markup ciode is a simplistic set, Alas, some of 
> them use the SAME symbol series to turn a feature on, or off- it is a 
> toggle.
>
> SOME functions use the SAME tags to turn a function ON or OFF- it is a 
> toggle.
>
> For instance:
> '''bold''' text -   BOLD is printed BOLD, appears in output as
> <b>bold</b> text
> ''italic'' text - italic is printed in italics, as
> <i>italic</i> text 


A crude (untested pseudo-code) possibility:

bold = 0
tag = ["<B>", "</B>"]
text = file( ... ).read()
while text.find("'''") >= 0:
    text.replace("'''", tag[bold], 1)
    bold = not bold

This should go through the entire file, finding each instance of three 
single-quotes, and replacing *one* instance with a tag.  The tag in to 
use is chosen through use of a toggle variable, which is flipped after 
each replacement.

An obvious problem with this approach is that it searches through the 
entire length of the file on each pass.  For a long file with a bunch of 
tags at the end, this could be a real bottleneck.  An obvious 
optimization would be to use the results of text.find() to separate 
translated text from untranslated, something like:

complete = ''
while 1:
    index = text.find("'''")
    if index < 0:    break
    complete = '%s%s%s' % (complete, text[:index], tag[bold])
    index += len("'''")
    text = text[index:]
    bold = not bold

That'll still be slow, but at least you won't be searching the same 
sections of text over and over again.  And through judicious use of 
parameters and generalization, this could be converted into a function 
that could be run for each tag-type -- i.e., translate(wikicode, 
starttag, endtag).

This is not very smart.  It won't handle the unexpected well at all, and 
it doesn't ensure that each opening tag has a matching closing tag. 
 Obviously, if a wikicode is a substring of another code, you'll have to 
translate for the larger one first -- ''' must be done before '', for 
example.  There's probably other limitations too.  But it might be 
enough for a quick & dirty (but working) solution.

Jeff Shannon
Technician/Programmer
Credit International






More information about the Tutor mailing list