[pypy-dev] Parsing in PyPy (and runicode)
santagada at gmail.com
Mon Mar 2 16:00:03 CET 2009
On Feb 27, 2009, at 7:36 PM, Jim Idle wrote:
> Leonardo Santagada wrote:
>> On Feb 27, 2009, at 3:19 PM, Jacob Hallén wrote:
>>> fredagen den 27 februari 2009 skrev Frank Wierzbicki:
>>>> On Thu, Feb 26, 2009 at 2:55 PM, Leonardo Santagada <santagada at gmail.com
>>> Andrew Dalke, who is very thorough in his investigation of
>>> software, has
>>> written som interesting things about his experience with ANTLR as
>>> well as
>>> some other parsing projects. In short, he likes ANTLR as a tool,
>>> but in his
>>> application, it is considerably slower than some other alternatives.
>>> He also has something called python4ply, which is a ready, MIT
>>> parser for Python.
>>> You can find his articles on
>> The problem he might be having is with the python backend for
>> ANTLR, wich neither us (we are going to have to create a rpython
>> one) nor cpython (which would use a c89 one) would have. but this
>> is just a guess as I have had no time to read his article yet,
> All I can find is info about using the Python backend.
> Unfortunately, the Python backend is very slow. There was some
> discussion between the Python runtime author and Guido about why - I
> can't re-quote as it was private email, but basically the runtime
> and the generated code are method-call heavy, which isn't a good
> idea in Python. Also, as string handling and other things are not
> particularly quick, it is hard to get Python to perform when running
> a parser using a design that wasn't specifically tailored for
> Python. After all, Python wasn't really aimed at writing things like
> lexers and parsers and is much better at things in other domains.
> All that said, I think that the Python runtime will get better as
> there will be more expansive choices for backend runtime authors in
> the future. Then again, is the speed of the parser going to be a
Not much I think. Well I did take a look at both the support code and
the generated code (and a fast look at the runtime). The support code
in java is really crazy, at least for me I didn't get most of it (I
should read the docs later).
Now the generated code seems to be following the java backend so far
as being almost RPython. there is a problem that RPython doesn't have
sets so it will take some time to make it work... but I think it is
doable. Somehow my generated code has "pass" before every block of
code on both parsers and lexers, the reason for that is still a
mistery for me (do anyone knows why?).
Maybe in this weekend I will have more time to look/work more seriosly
> The Java runtime is a lot quicker that the Python runtime and unless
> there are Python translation units with 25,000 lines (there will be
> somewhere ;-), performance would not be a factor.
> When I wrote the C runtime however, I did not make a blind copy of
> the Java runtime, hence the performance is akin to hand written
> code. For instance the GNU C parser written for ANTLR and running
> with the C runtime, is almost the same speed as the GNU C parser
This is great, but is the code C89 or do you use something from a
newer standard? Because the only way to have a shot at using it with
cpython would be if it was C89... though using the same parser in
jython and pypy would be cool enough.
> None of this would help (in terms of ANTLR) if you want a Python
> parser that runs in Python of course :-)
Well knowing that the Java one is quick is a good indication that the
rpython one can be quick also.
Thanks for all the info and for the quick response,
santagada at gmail.com
More information about the Pypy-dev