[Tutor] Re: handling string!!

Thu Oct 23 20:27:55 EDT 2003

--- Andrei <project5 at redrival.net> wrote:
> Daniel Ehrenberg wrote:
> 
> <snip>
> > I have a somewhat related question. I am trying to
> > write a program to parse the simple markup
> language
> > used at Wikipedia.org. For this specific question,
> the
> > markup is the same as in MoinMoin.
> 
> Kirk Bailey (who's around here too) has an open
> source Wiki at 
> tinylist.org, written in Python:
> 
> http://www.tinylist.org/cgi-bin/wikinehesaed2.py
> 
> It handles this kinda thing quite well, I just
> tested it at the bottom of 
>
http://www.tinylist.org/cgi-bin/wikinehesaed2.py/SandBox.
> Perhaps you 
> should look at its code.

It uses completely different markup. And anyway, it
didn't work.
> 
> > '''bold''' -> <strong>bold</strong>
> > ''italics'' -> <em>italics</em>
> > '''''bold and italics''''' -> <strong><em>bold and
> > italics</em></strong>
> > '''''b & i'' b''' -> <strong><em>b & i</em>
> b</strong>
> > '''''b & i''' i'' -> <em><strong>b & i</strong>
> i</em>
> 
> <snip>
> 
> You just have to keep track of what you have open
> and apply the first open, 
> last to close principle (use a list to which you
> append tags when you open 
> them and then delete them when you close them
> starting from the last). 
> In 
> your 5th example:
> 
>  > '''''b & i''' i'' -> <em><strong>b & i</strong>
> i</em>
> 
> your parser would e.g. first hit ''' (open <strong>
> and append it to the 
> OpenTags list), then the '' (open <em> and append it
> to the OpenTags list). 
> When it finds the closing ''', it tries to close the
> <strong>, but it 
> notices in the OpenTags list that there are tags
> before it. It closes those 
> first (in this case, the last tag in OpenTags is
> <em>, so it closes it 
> first, but places it in a different list, say
> RestoreTags), then it closes 
> the <strong> and reopens the ones in RestoreTags -
> obviously, these end up 
> being on the OpenTags list again. The generated code
> is then:
> 
> <strong><em>b & i</em></strong><em> i</em>
> 
> Which is not perfect, but it's valid XHTML :).
> Making it really intelligent 
> would be quite a bit harder, especially if you
> consider you might be 
> nesting more tags.

Well, then I'd have to write the code completely
differently than I wrote it on the previous letter.
How do you think I could go about writing the actual
code using that method?

> 
> I'm not sure this is the way Kirk's Wiki does it,
> but I know it would work 
> because I use this same principle in my regular
> expression tool to 
> highlight parentheses.
> 
Kirk's wiki uses different things for opening quotes
and closing quotes.

> I'm wondering how you'd handle '''''' though (can be
> two bolds or three 
> italics).
> 
> -- 
> Yours,
> 
> Andrei

Well, that would be an error on the writer's part, so
the resulting XHTML would probably also be eroneous :)
I don't want to spend too much time checking for
errors, as the original MediaWiki (Wikipedia's
PHP/mySQL implimentation) software doesn't either and
I only want compatability with it, not extra features
(except for an added GUI).
LDan

__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com