Bolting a Bison/Flex parser into Python

Thu May 11 19:58:41 EDT 2000

>>>>> "Clint" == Clint Olsen <olsenc at kodiak.ee.washington.edu> writes:

Clint> Hello: I have this desire to write a parser using GNU Bison and
Clint> then allow people to write Python code to use the objects built
Clint> by the parser.  I'm not quite sure how to design this, and I'm
Clint> not sure if I'm approaching the problem in the correct way.

[snip]

Clint> My motivation for doing this is to create a rock solid parser
Clint> using Bison (I've had experience with this before), and using
Clint> the Pyton/C API to allow Python programmers to to write
Clint> programs using lists and dictionaries I create.

Clint> This allows me to accomplish two things:

Clint> 1) I have a C interface to write programs to hook up to my
Clint> parser should I need the speed of C.  2) People who don't want
Clint> to get bogged down by C++/C can use a higher-level interpreted
Clint> language to get work done much quicker using the same parser.
Clint> I don't have to maintain two parsers.

[snip]

Clint> So, my question is this: Is my objective reasonable, and can
Clint> you point me in a general direction in the Python documentation
Clint> for accomplishing this task?  I don't think embedding the
Clint> Python interpreter into my C program is necessarily what I
Clint> want, but perhaps someone here has had some experience with
Clint> this type of application.  This will keep me from making poor
Clint> decisions from the beginning that could make the task more
Clint> complicated.

Not sure if it is a reasonable way to go because you have not provided
enough information to make that decision.

I imagine that it would not be too hard to achieve what you want to
do.  One of the big reasons for using bison is that the parser is
going to be pretty fast.  Certainly much faster than the same thing in
Python.

You did not mention the structure, volume, and typical usage patterns
on your data, so it is a bit hard to make a good recommendation.  If
you are dealing with a large amount of data of which applications
typically access less than 10%, then you could benefit from a
technique I have used here to speed Python programs by an order of
magnitude.

By loading all of the parse tree into native C structures, you retain
compatibility with the non Python applications.  You could then wrap
those native C structures with Python proxies on demand.  If the
Python program does not access an attribute, then do not create it.
References to child parse tree nodes are handled in exactly the same
way.  When a reference to a child node is used, create the proxy for
the child and cache the reference in the proxy for the parent node.

If you need any more explanation, I would be more than happy to help
you out.

- Dave