[pypy-dev] Compiler

Armin Rigo arigo at tunes.org
Sun May 8 11:26:59 CEST 2005


Hi Ludovic, hi all,

I had a look at recparser, and how to integrate it into PyPy.  Ideally,
it can be exported as the 'parser' module by adding a line to
interpreter/baseobjspace.py (see the commented-out line about the other
'parser').  A few comments about the interface file pyparser.py (this
should be put in some documentation...):

* applevel() requires obscure tweaking about the 'import compiler'
  statement, the prevent the whole compiler package to be dragged in and
  compiled by PyPy (which may be what we want later, but for now it just
  doesn't work, I expect).  I checked that in.

* the interpleveldef exports a class, 'STType'.  I added another hack in
  lazymodule.py to make that work.  Basically, the interp-level exports
  had to be wrapped objects, or functions -- which get wrapped
  automatically.  Types now also get wrapped automatically.  Previously,
  you'd have needed an interpleveldef like
  
     'STType': 'space.gettypeobject(pyparser.STType.typedef)'

  which fishes the typedef (i.e. the definition of the app-level type)
  corresponding to the class STType, and asks the space to build a real
  app-level type object for it.

At the moment, with the above changes, it appears to work rather nicely
(at least the few exported methods).  But we cannot feed the parse
tuples to the pure Python compiler package because the latter expect
tuples with line number information, and as far as I see you're always
generating tuples without.  It seems that you're collecting the
information already so it should not be difficult to fix.


The next step would be to integrate it so that it is used by the
built-ins, like compile().  There is a new abstraction, class Compiler,
in pypy.interpreter.compiler.  Its purpose is to be subclassed by
concrete compilers; currently there is only CPythonCompiler, which
cheats and calls compile() at interpreter-level.  I guess that it should
be possible to create another subclass that uses recparser and the pure
Python compiler package to do its job, or even a generic PythonCompiler
that uses whatever built-in 'parser' module is available, and then the
pure Python compiler package.

All of PyPy ends up using the compiler instance is stored in the current
execution context whenever it needs to compile source code (including at
the interactive prompt).


Finally, a quick look over the recparser sources shows a few constructs
that are clearly not "RPython", i.e. too dynamic.  We need to think a
bit and see how to address the issue.  About RPython:
http://codespeak.net/pypy/index.cgi?doc/coding-style.html#restricted-python

Before we actually try to perform type inference on recparser, it's a
bit hard to know if there are type problems or not.  It is often the
case that even when we write code knowing that it should be RPython we
oversee some subtle typing problem.  I'll give it a try, I guess (this
is done by enabling the recparser module in baseobjspace as hinted
above, running "dist/goal/translate_pypy.py targetpypy", and trying to
make sense out of the obscure assertion errors and enormous flow graphs
we get...)

For now, a problematic feature that is obvious is the visitor pattern
that you use extensively.  It's definitely a great pattern, but not one
that immediately applies to C- or Java-like languages.  I'm not saying
that you should rewrite all of recparser; more that we need to find a
trick to implement visitor patterns without the getattr() with a
computed attribute name.  Possibly something along these lines:

    class MyVisitor:
        def visit_name1(self, node):
            ...
        def visit_name2(self, node):
            ...

        # this can be computed by a for loop instead:
        VISIT_MAP = {'name1': visit_name1,
                     'name2': visit_name2,
                    }
    
    class Node:
        def visit(self, visitor):
            visit_meth = visitor.VISIT_MAP[self.name]
            visit_meth(visitor, self)

The difference with the getattr() case is that the operation that
replaces it, a getitem on a constant dictionary, has a reasonable
C-level equivalent, namely a (precomputed) hash table lookup.


That's it for now.  Don't hesitate to ask if I'm not making sense, or
for more help about integration issues.  I am aware that it is some kind
of guesswork at the moment.  Just feel free to post to pypy-dev.


A bientot,

Armin.



More information about the Pypy-dev mailing list