[Tutor] Regular Expression guru saught
Jeff Shannon
jeff@ccvcorp.com
Mon Aug 4 23:04:02 EDT 2003
Kirk Bailey wrote:
> ok, here's the rub;
> I am writing a wiki. A wiki stores the body of a page in a flat text
> file. NO HTML. The markup ciode is a simplistic set, Alas, some of
> them use the SAME symbol series to turn a feature on, or off- it is a
> toggle.
>
> SOME functions use the SAME tags to turn a function ON or OFF- it is a
> toggle.
>
> For instance:
> '''bold''' text - BOLD is printed BOLD, appears in output as
> <b>bold</b> text
> ''italic'' text - italic is printed in italics, as
> <i>italic</i> text
A crude (untested pseudo-code) possibility:
bold = 0
tag = ["<B>", "</B>"]
text = file( ... ).read()
while text.find("'''") >= 0:
text.replace("'''", tag[bold], 1)
bold = not bold
This should go through the entire file, finding each instance of three
single-quotes, and replacing *one* instance with a tag. The tag in to
use is chosen through use of a toggle variable, which is flipped after
each replacement.
An obvious problem with this approach is that it searches through the
entire length of the file on each pass. For a long file with a bunch of
tags at the end, this could be a real bottleneck. An obvious
optimization would be to use the results of text.find() to separate
translated text from untranslated, something like:
complete = ''
while 1:
index = text.find("'''")
if index < 0: break
complete = '%s%s%s' % (complete, text[:index], tag[bold])
index += len("'''")
text = text[index:]
bold = not bold
That'll still be slow, but at least you won't be searching the same
sections of text over and over again. And through judicious use of
parameters and generalization, this could be converted into a function
that could be run for each tag-type -- i.e., translate(wikicode,
starttag, endtag).
This is not very smart. It won't handle the unexpected well at all, and
it doesn't ensure that each opening tag has a matching closing tag.
Obviously, if a wikicode is a substring of another code, you'll have to
translate for the larger one first -- ''' must be done before '', for
example. There's probably other limitations too. But it might be
enough for a quick & dirty (but working) solution.
Jeff Shannon
Technician/Programmer
Credit International
More information about the Tutor
mailing list