[Tutor] parsing--is this right?

Paul Tremblay phthenry@earthlink.net
Mon, 10 Jun 2002 23:21:15 -0400


On Mon, Jun 10, 2002 at 03:02:37PM -0700, Danny Yoo wrote:

> 
> Hmmm... if the structure is always described by braces, perhaps it might
> be easier to do a "recursive descent" parse through the message.  Here's a
> sample parser that shows how the technique works.
> 

First, thanks for your post, and thanks also to Sean and Derrick.
Sean made a nice distinction between a parser and lexer. Derrick
suggested I used a parser already written. In fact, I haven't
been able to find a parser in python to parse rtf. The only thing
I found was in java, a nice utility called majix. However, for
some bizzare reason, this utility eliminates footnotes rather
than tag them.

Danny, I had a few questions about how all this code works. Keep
in mind I'm new to python. I see that you pass a list of tokesn
to the parser method in the class chunk. 

(1)I don't understand how this method continues to read each
item in the list. 

(2) I don't understand why you don't have to create an object
first to use this method.

(3) I don't understand how you can call on the Chunk method, when
it is the name of a class.

(4) I don't understand how this code would work if the tokens
were broken over lines. I guess you could read the file in as
lines and set your example text to lines. 

I posted my  code written in plex a few emails ago with the
subject "rtf to xml regexp question." Despite having to count
brackets, this code is easier for me to understand. It seems that
my code lets the lexer do most of the dirty work.

> I've had some experience with the SPARK parser, and it's a nice tool:
> 
>     http://pages.cpsc.ucalgary.ca/~aycock/spark/
> 
> I used it a while back when I was playing with Pythonica, and it seemed to
> work very well.  The only caveat I could make was that, back then, if
> there was an error in the grammar, SPARK wouldn't give good error
> messages.  I'm not sure if this complaints can be made now: perhaps this
> has been fixed.
> 
> 
> 
> Another parser that I've heard good things about is Amit Patel's YAPPS
> parser:
> 
>     http://theory.stanford.edu/~amitp/Yapps/
> 
> 
> 

Yes, I've heard some good things about SPARK. I couldn't find any
documentation on it, though! I also found a good parser called
simpleparse, which is based on the module written in C called
mxTextTools. Simpleparse has a very nice interface. However, I
got an error message when using code straight from a tutorial.

There is acutally a python special interest group (sig) just for
parsing. It was established in Feburary, and only one person has
posted to it.

Thanks!

Paul

-- 

************************
*Paul Tremblay         *
*phthenry@earthlink.net*
************************