[soc2008-general] Proposal for a PEG(parsing expression grammar) parser generator for Python
Robert Bradshaw
robertwb at math.washington.edu
Tue Mar 18 13:55:28 CET 2008
If you're interested in processing Python code you might want to
consider writing up a project for http://www.cython.org/ . We already
have a parser though, and we are not looking to completely replace
it, but there are still lots of interesting project ideas.
I can't speak for everyone though, perhaps there are other people
that would be interested in an actual PEG for Python.
- Robert
On Mar 18, 2008, at 5:23 AM, Chiyuan Zhang wrote:
> Hello,
>
> I'm interested in participating in GSoC 2008. I'm a student from
> Zhejiang University of China. I'm majoring in Computer Science and
> Technology. I'm taking a course on compiling this term. We are using
> the classical LALR (Left-to-right parse, Rightmost-derivation, with
> look-ahead)[1] way. But I've heard another way of parsing: parsing
> expression grammar, or PEG[2].
>
> Parsing expression grammars look similar to regular expressions or
> context-free grammars (CFG) in Backus-Naur form (BNF) notation, but
> have a different interpretation. Unlike CFGs, PEGs are not ambiguous;
> if a string parses, it has exactly one valid parse tree. This suits
> PEGs well to parsing computer languages, but not natural languages.
>
> There's a PEG parser generator for Ruby named Treetop[3]. It follows a
> cool DSL way. Here's part of the example taken from Treetop homepage:
>
> grammar Arithmetic
> rule additive
> multitive '+' additive {
> def value
> multitive.value + additive.value
> end
> }
> /
> multitive
> end
>
> # other rules below ...
> end
>
> But there seems no Python tool for PEGs (except PyPy rlib parsing[4]
> as a packrat parser generator). So I'm willing to implement such a
> Treetop-like PEG parser generator for Python. I'd like this to be a
> project of GSoC for the Python Software Foundation. Is there anyone
> interested in being my mentor for this project?
>
> As to myself, I have experience working with open source people. I had
> been reporting bugs or providing patches to open source communities. I
> also have some project myself. Here're two examples:
>
> * RMMSeg[5]: An implementation of the MMSEG maximum-matching Chinese
> word segmentation algorithm for Ruby.
>
> * YASnippet[6]: Yet another snippet extension for Emacs. It is a
> (much better) replacement of smart-snippet[7] (also my work). It
> provides a simple but powerful template facility like the one
> present in TextMate. If you are an Emacser, you should check it
> out! :D
>
> I have read your expectations for Google Summer of Code students on
> the
> Python wiki. I think I satisfy the expectations except that it is
> sometimes difficult for me use the IRC, mainly due to the time zone
> problem.
>
> So, would you consider this proposal? If yes, I'd be very happy. If
> no, I'm also interested in applying some other projects (of PSF or
> other mentoring organizations).
>
> --------------
> References:
> [1] LALR on Wikipedia: http://en.wikipedia.org/wiki/LALR_parser
> [2] PEG on Wikipedia:
> http://en.wikipedia.org/wiki/Parsing_expression_grammar
> [3] Treetop project homepage: http://treetop.rubyforge.org/
> [4] PyPy rlib parsing document:
> http://codespeak.net/pypy/dist/pypy/doc/rlib.html#parsing
> [5] RMMSeg project homepage: http://rmmseg.rubyforge.org/
> [6] YASnippet project homepage: http://yasnippet.googlecode.com/
> [7] smart-snippet project homepage:
> http://smart-snippet.googlecode.com/
>
> --------------
> Other links:
> * My Blog (mainly Chinese): http://pluskid.lifegoo.com/
> * My Email address: pluskid at gmail.com
> _______________________________________________
> soc2008-general mailing list
> soc2008-general at python.org
> http://mail.python.org/mailman/listinfo/soc2008-general
More information about the soc2008-general
mailing list