[pypy-dev] Parsing in PyPy (and runicode)

Leonardo Santagada santagada at gmail.com
Mon Mar 2 16:00:03 CET 2009

On Feb 27, 2009, at 7:36 PM, Jim Idle wrote:

> Leonardo Santagada wrote:
>> On Feb 27, 2009, at 3:19 PM, Jacob Hallén wrote:
>>> fredagen den 27 februari 2009 skrev Frank Wierzbicki:
>>>> On Thu, Feb 26, 2009 at 2:55 PM, Leonardo Santagada <santagada at gmail.com 
>>>> >
>>> Andrew Dalke, who is very thorough in his investigation of  
>>> software, has
>>> written som interesting things about his experience with ANTLR as  
>>> well as
>>> some other parsing projects. In short, he likes ANTLR as a tool,  
>>> but in his
>>> application, it is considerably slower than some other alternatives.
>>> He also has something called python4ply, which is a ready, MIT  
>>> licensed
>>> parser for Python.
>>> You can find his articles on
>>> http://www.dalkescientific.com/writings/diary/archive/
>> The problem he might be having is with the python backend for  
>> ANTLR, wich neither us (we are going to have to create a rpython  
>> one) nor cpython (which would use a c89 one) would have. but this  
>> is just a guess as I have had no time to read his article yet,
> All I can find is info about using the Python backend.  
> Unfortunately, the Python backend is very slow. There was some  
> discussion between the Python runtime author and Guido about why - I  
> can't re-quote as it was private email, but basically the runtime  
> and the generated code are method-call heavy, which isn't a good  
> idea in Python. Also, as string handling  and other things are not  
> particularly quick, it is hard to get Python to perform when running  
> a parser using a design that wasn't specifically tailored for  
> Python. After all, Python wasn't really aimed at writing things like  
> lexers and parsers and is much better at things in other domains.  
> All that said, I think that the Python runtime will get better as  
> there will be more expansive choices for backend runtime authors in  
> the future. Then again, is the speed of the parser going to be a  
> factor?

Not much I think. Well I did take a look at both the support code and  
the generated code (and a fast look at the runtime). The support code  
in java is really crazy, at least for me I didn't get most of it (I  
should read the docs later).

Now the generated code seems to be following the java backend so far  
as being almost RPython. there is a problem that RPython doesn't have  
sets so it will take some time to make it work... but I think it is  
doable. Somehow my generated code has "pass" before every block of  
code on both parsers and lexers, the reason for that is still a  
mistery for me (do anyone knows why?).

Maybe in this weekend I will have more time to look/work more seriosly  
on this.

> The Java runtime is a lot quicker that the Python runtime and unless  
> there are Python translation units with 25,000 lines (there will be  
> somewhere ;-), performance would not be a factor.
> When I wrote the C runtime however, I did not make a blind copy of  
> the Java runtime, hence the performance is akin to hand written  
> code. For instance the GNU C parser written for ANTLR and running  
> with the C runtime, is almost the same speed as the GNU C parser  
> itself.

This is great, but is the code C89 or do you use something from a  
newer standard? Because the only way to have a shot at using it with  
cpython would be if it was C89... though using the same parser in  
jython and pypy would be cool enough.

> None of this would help (in terms of ANTLR) if you want a Python  
> parser that runs in Python of course :-)

Well knowing that the Java one is quick is a good indication that the  
rpython one can be quick also.

Thanks for all the info and for the quick response,
Leonardo Santagada
santagada at gmail.com

More information about the Pypy-dev mailing list