[Tutor] Re: handling string!!
Daniel Ehrenberg
littledanehren at yahoo.com
Thu Oct 23 20:27:55 EDT 2003
--- Andrei <project5 at redrival.net> wrote:
> Daniel Ehrenberg wrote:
>
> <snip>
> > I have a somewhat related question. I am trying to
> > write a program to parse the simple markup
> language
> > used at Wikipedia.org. For this specific question,
> the
> > markup is the same as in MoinMoin.
>
> Kirk Bailey (who's around here too) has an open
> source Wiki at
> tinylist.org, written in Python:
>
> http://www.tinylist.org/cgi-bin/wikinehesaed2.py
>
> It handles this kinda thing quite well, I just
> tested it at the bottom of
>
http://www.tinylist.org/cgi-bin/wikinehesaed2.py/SandBox.
> Perhaps you
> should look at its code.
It uses completely different markup. And anyway, it
didn't work.
>
> > '''bold''' -> <strong>bold</strong>
> > ''italics'' -> <em>italics</em>
> > '''''bold and italics''''' -> <strong><em>bold and
> > italics</em></strong>
> > '''''b & i'' b''' -> <strong><em>b & i</em>
> b</strong>
> > '''''b & i''' i'' -> <em><strong>b & i</strong>
> i</em>
>
> <snip>
>
> You just have to keep track of what you have open
> and apply the first open,
> last to close principle (use a list to which you
> append tags when you open
> them and then delete them when you close them
> starting from the last).
> In
> your 5th example:
>
> > '''''b & i''' i'' -> <em><strong>b & i</strong>
> i</em>
>
> your parser would e.g. first hit ''' (open <strong>
> and append it to the
> OpenTags list), then the '' (open <em> and append it
> to the OpenTags list).
> When it finds the closing ''', it tries to close the
> <strong>, but it
> notices in the OpenTags list that there are tags
> before it. It closes those
> first (in this case, the last tag in OpenTags is
> <em>, so it closes it
> first, but places it in a different list, say
> RestoreTags), then it closes
> the <strong> and reopens the ones in RestoreTags -
> obviously, these end up
> being on the OpenTags list again. The generated code
> is then:
>
> <strong><em>b & i</em></strong><em> i</em>
>
> Which is not perfect, but it's valid XHTML :).
> Making it really intelligent
> would be quite a bit harder, especially if you
> consider you might be
> nesting more tags.
Well, then I'd have to write the code completely
differently than I wrote it on the previous letter.
How do you think I could go about writing the actual
code using that method?
>
> I'm not sure this is the way Kirk's Wiki does it,
> but I know it would work
> because I use this same principle in my regular
> expression tool to
> highlight parentheses.
>
Kirk's wiki uses different things for opening quotes
and closing quotes.
> I'm wondering how you'd handle '''''' though (can be
> two bolds or three
> italics).
>
> --
> Yours,
>
> Andrei
Well, that would be an error on the writer's part, so
the resulting XHTML would probably also be eroneous :)
I don't want to spend too much time checking for
errors, as the original MediaWiki (Wikipedia's
PHP/mySQL implimentation) software doesn't either and
I only want compatability with it, not extra features
(except for an added GUI).
LDan
__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
More information about the Tutor
mailing list