
I remember some time ago people on #pypy were talking about redoing the parser for python because it was not good enough or something. Also the parser generator that cfbolz wrote doesn't support unicode and was not suited for automatic-semicolon-insertion. I think it would be good for a javascript parser that supports unicode because the specs call for it and maybe it would be good to python too (don't know about prolog/smalltalk though). What would be better, to have a parser generator that supports unicode or just everyone write their own recursive descendant parser by hand? -- Leonardo Santagada santagada at gmail.com

On Feb 26, 2009, at 2:40 PM, Carl Friedrich Bolz wrote:
All the ones that come from narcissus are actually just licensed by Mozilla, so we should talk to them. But I don't know if it is complete and without bugs (I think there was one with narcissus but I don't remember what was it) so maybe we should use one used by a functioning js interpreters like Rhino of JavascriptCore (Squirrelfish Extreme or whatever its name is now). Both uses an incompatible license, and I think it is on purpose... so I don't know how to deal with this. The V8 parser is BSD so I think it is compatible, but it will be some work to convert it from c++ to rpython (http://code.google.com/p/v8/source/browse/trunk/src/parser.cc ) My final answer would be, "I don't know, what do you guys think?". -- Leonardo Santagada santagada at gmail.com

Perhaps consider Antlr? We've had good success with this for Jython, and it's now also used by NetBeans support for Python in general. I took a look at http://www.antlr.org/grammar/list, and there are a number of options for JS. Most importantly, Antlr supports the parser chain in Python, so it's possible this could be more readily converted to RPython. Some potential issues: Chris Lambrou has a parser for EcmaScript 3.0. But there's no license here, so you'd definitely have to contact him on this. Like standalone grammars in general, it's rather unlikely to have been extensively tested. With Jython we started with a reference parser that Terrence Parr had made, and taking CPython as reference here as to what the AST should be, over time we targeted that by explicitly comparing ASTS. Some time later, including incremental parse support and various syntax errors, we're pretty confident that it's basically done. - Jim On Thu, Feb 26, 2009 at 11:29 AM, Leonardo Santagada <santagada@gmail.com>wrote:
-- Jim Baker jbaker@zyasoft.com

On Feb 26, 2009, at 3:44 PM, Jim Baker wrote:
Yep this could be done, if so I would use this grammar file http://research.xebic.com/es3/ and then put the code for JS 1.5+ in it as we start supporting those features.
The thing that would be great is if pypy and jython would use the same parser using antlr so the work to support python 3.0 (and 2.7, 2.8, etc) could be partially shared :) -- Leonardo Santagada santagada at gmail.com

Please feel free to use our Python parser, it's licensed under the original BSD license from Terrence Parr. https://jython.svn.sourceforge.net/svnroot/jython/trunk/jython/grammar/ I don't know where the 3.0 work is currently located, in any event it's something we have put on hold for the moment as we get the 2.5 release out. I cc-ed in Frank to ensure he is in the loop here. In terms of the parser itself: there's a modest amount of Java code in the actions, but that should be easy to convert. Supporting RPython generation in Antlr then makes even more sense in this case: as I understand it, Antlr uses TP's other project, StringTemplate, to simplify the construction of multiple backends. - Jim On Thu, Feb 26, 2009 at 12:55 PM, Leonardo Santagada <santagada@gmail.com>wrote:
-- Jim Baker jbaker@zyasoft.com

On Thu, Feb 26, 2009 at 2:55 PM, Leonardo Santagada <santagada@gmail.com> wrote: base grammar (I remember it has a sort of diff-merge form of inheritance, my google skills are failing me, I'll find a reference today sometime I'm sure). At the JVM Language summit last year, I met ANTLR expert Jim Idle, and he expressed an interest in seeing if the Jython grammar could be used as a grammar for CPython. I've copied Jim Idle on this email. As a side note, it appears that Guido van Rossum has had some positive experiences with ANTLR recently: http://www.antlr.org/pipermail/antlr-interest/2009-February/032783.html -Frank

fredagen den 27 februari 2009 skrev Frank Wierzbicki:
Andrew Dalke, who is very thorough in his investigation of software, has written som interesting things about his experience with ANTLR as well as some other parsing projects. In short, he likes ANTLR as a tool, but in his application, it is considerably slower than some other alternatives. He also has something called python4ply, which is a ready, MIT licensed parser for Python. You can find his articles on http://www.dalkescientific.com/writings/diary/archive/ Jacob Hallén

On Feb 27, 2009, at 3:19 PM, Jacob Hallén wrote:
The problem he might be having is with the python backend for ANTLR, wich neither us (we are going to have to create a rpython one) nor cpython (which would use a c89 one) would have. but this is just a guess as I have had no time to read his article yet -- Leonardo Santagada santagada at gmail.com

On Fri, Feb 27, 2009 at 1:19 PM, Jacob Hallén <jacob@openend.se> wrote:
""" It looks like every character read incurs several Python function calls, which are a lot more expensive in Python than in Java or C++. There's no easy change for this, so I'm pretty sure the ANTLR-generated parser always going to be slower than PLY. """ I wonder if producing a parser in C would work for PyPy if this is unavoidable? I know it misses the purpose of PyPy a bit -- just a thought :) -Frank

Hi, On Fri, Feb 27, 2009 at 01:38:39PM -0500, Frank Wierzbicki wrote:
No, our goal would be to generate an RPython parser, not just a Python one. A few function calls are not more of a problem for RPython than they are for C. So that should be fine. In other words, if ANTLR generates parsers expecting the speed of a Java or C++ kind of backend, then an RPython backend is no problem either. A bientot, Armin.

On Feb 26, 2009, at 2:40 PM, Carl Friedrich Bolz wrote:
All the ones that come from narcissus are actually just licensed by Mozilla, so we should talk to them. But I don't know if it is complete and without bugs (I think there was one with narcissus but I don't remember what was it) so maybe we should use one used by a functioning js interpreters like Rhino of JavascriptCore (Squirrelfish Extreme or whatever its name is now). Both uses an incompatible license, and I think it is on purpose... so I don't know how to deal with this. The V8 parser is BSD so I think it is compatible, but it will be some work to convert it from c++ to rpython (http://code.google.com/p/v8/source/browse/trunk/src/parser.cc ) My final answer would be, "I don't know, what do you guys think?". -- Leonardo Santagada santagada at gmail.com

Perhaps consider Antlr? We've had good success with this for Jython, and it's now also used by NetBeans support for Python in general. I took a look at http://www.antlr.org/grammar/list, and there are a number of options for JS. Most importantly, Antlr supports the parser chain in Python, so it's possible this could be more readily converted to RPython. Some potential issues: Chris Lambrou has a parser for EcmaScript 3.0. But there's no license here, so you'd definitely have to contact him on this. Like standalone grammars in general, it's rather unlikely to have been extensively tested. With Jython we started with a reference parser that Terrence Parr had made, and taking CPython as reference here as to what the AST should be, over time we targeted that by explicitly comparing ASTS. Some time later, including incremental parse support and various syntax errors, we're pretty confident that it's basically done. - Jim On Thu, Feb 26, 2009 at 11:29 AM, Leonardo Santagada <santagada@gmail.com>wrote:
-- Jim Baker jbaker@zyasoft.com

On Feb 26, 2009, at 3:44 PM, Jim Baker wrote:
Yep this could be done, if so I would use this grammar file http://research.xebic.com/es3/ and then put the code for JS 1.5+ in it as we start supporting those features.
The thing that would be great is if pypy and jython would use the same parser using antlr so the work to support python 3.0 (and 2.7, 2.8, etc) could be partially shared :) -- Leonardo Santagada santagada at gmail.com

Please feel free to use our Python parser, it's licensed under the original BSD license from Terrence Parr. https://jython.svn.sourceforge.net/svnroot/jython/trunk/jython/grammar/ I don't know where the 3.0 work is currently located, in any event it's something we have put on hold for the moment as we get the 2.5 release out. I cc-ed in Frank to ensure he is in the loop here. In terms of the parser itself: there's a modest amount of Java code in the actions, but that should be easy to convert. Supporting RPython generation in Antlr then makes even more sense in this case: as I understand it, Antlr uses TP's other project, StringTemplate, to simplify the construction of multiple backends. - Jim On Thu, Feb 26, 2009 at 12:55 PM, Leonardo Santagada <santagada@gmail.com>wrote:
-- Jim Baker jbaker@zyasoft.com

On Thu, Feb 26, 2009 at 2:55 PM, Leonardo Santagada <santagada@gmail.com> wrote: base grammar (I remember it has a sort of diff-merge form of inheritance, my google skills are failing me, I'll find a reference today sometime I'm sure). At the JVM Language summit last year, I met ANTLR expert Jim Idle, and he expressed an interest in seeing if the Jython grammar could be used as a grammar for CPython. I've copied Jim Idle on this email. As a side note, it appears that Guido van Rossum has had some positive experiences with ANTLR recently: http://www.antlr.org/pipermail/antlr-interest/2009-February/032783.html -Frank

fredagen den 27 februari 2009 skrev Frank Wierzbicki:
Andrew Dalke, who is very thorough in his investigation of software, has written som interesting things about his experience with ANTLR as well as some other parsing projects. In short, he likes ANTLR as a tool, but in his application, it is considerably slower than some other alternatives. He also has something called python4ply, which is a ready, MIT licensed parser for Python. You can find his articles on http://www.dalkescientific.com/writings/diary/archive/ Jacob Hallén

On Feb 27, 2009, at 3:19 PM, Jacob Hallén wrote:
The problem he might be having is with the python backend for ANTLR, wich neither us (we are going to have to create a rpython one) nor cpython (which would use a c89 one) would have. but this is just a guess as I have had no time to read his article yet -- Leonardo Santagada santagada at gmail.com

On Fri, Feb 27, 2009 at 1:19 PM, Jacob Hallén <jacob@openend.se> wrote:
""" It looks like every character read incurs several Python function calls, which are a lot more expensive in Python than in Java or C++. There's no easy change for this, so I'm pretty sure the ANTLR-generated parser always going to be slower than PLY. """ I wonder if producing a parser in C would work for PyPy if this is unavoidable? I know it misses the purpose of PyPy a bit -- just a thought :) -Frank

Hi, On Fri, Feb 27, 2009 at 01:38:39PM -0500, Frank Wierzbicki wrote:
No, our goal would be to generate an RPython parser, not just a Python one. A few function calls are not more of a problem for RPython than they are for C. So that should be fine. In other words, if ANTLR generates parsers expecting the speed of a Java or C++ kind of backend, then an RPython backend is no problem either. A bientot, Armin.
participants (7)
-
Armin Rigo
-
Carl Friedrich Bolz
-
Frank Wierzbicki
-
holger krekel
-
Jacob Hallén
-
Jim Baker
-
Leonardo Santagada